(¢|Q|c) 


A Course in Quantum Computing 
for the 
Community College 
Volume 1 


Michael Loceff 
Foothill College 
mailto:Loceff{Michael@fhda.edu 


© 2015 Michael Loceff 
All Rights Reserved 


This work is licensed under the Creative Commons 
Attribution-NonCommercial-NoDerivatives 4.0 International License. 


To view a copy of this license, visit 


http: //creativecommons.org/licenses/by-nc-nd/4.0/. 


Ove. 


Contents 


0 Introduction 
0.1 Welcome to Volume One ...............-2.-020005 
O10 Alok the Wome. 6 eh bo ee ee oA ee ee 
O12 About thie Introdyetion ...<.24 462244 eee Ge 
OS Ie Bh ee: gk ck ne ee Oe ee ee ek Be ee ES 
0.2.1 More Information than Zero and One............ 
0.2.2. The Probabilistic Nature of Qubits ............. 


0.2.3 Quantum Mechanics — The Tool that Tames the Beast 


0.2.4 Sneak Peek at the Coefficients a and@G........... 
0.3 The Promise of Quantum Computing ................ 
Ov. Havly Resulig ....6 64 ne be ee ee ee ee es 
0.3.2 The Role of Computer Scientists .............. 
0.4 The Two Sides of Quantum Computer Science ........... 
O41. (irenit Degen... 2 see Oe ee ee ee ee eee 
Dee 2 ee ave Gh we & GR ee Oe Pw eo 
ay PRPSOSCNVS £4 244 bebe a Rh eRe PERSE Rea DH 
0.6 ‘IWNavigating the lopitee . . 2.2.5 4.8454 ee eee ee ee ES 


Complex Arithmetic 


1.1 Complex Numbers for Quantum Computing ............ 
1.2 The Field of Complex Numbers ................... 
1.2.1 The Real Numbers Just Don’t Cut It ............ 
Lae The Dennniot OG ¢ sad ge ae bbe BAGS He REEBS 
Lao The tomes Pipne. «2.244 6454426 bE we Hb Gwe S 
1.2.4 Operations on Complex Numbers .............. 
Loe EGP gn eS eee eh ae GRE EREES BEGG 
1.3 Exploring the Complex Plane .................... 


1.3.1 Complex Numbers as Ordered Pairs of Real Numbers .... . 35 


1.3.2. Real and Imaginary Axes and Polar Representation ...... 35 

1.3.3 Complex Conjugate and Modulus ................ 36 

1.4 Transcendental Functions and Their Identities ..........2.2.. 38 

1.4.1. The Complex Exponential Function Part 1: Pure Imaginary Case 38 

1.4.2 Real sin() and cos() in Terms of the Complex Exponential .. 41 

1.4.3 Complex Exponential Part 2: Any Complex Number .... . Al 

1.4.4 The Complex Trigonometric Functions ............. 42 

1.4.5 Polar Relations Expressed Using the Exponential ....... 42 

ee ee We cc ke hw eh eee Bet ARG we EES 44 

LG.l A Dieting Solutions toe = 1 uns ewes ed oes ward: 44 

Loe Holes dey 2. ..c4 254 ge ee tae ee a eee se eB 46 

Loe pummiaten NomaHon ..< 2. e4¢ bee ee eae ede we Bo AT 

1.5.4 Summing Roots-of-Unity and the Kronecker Delta. ...... 48 

Real Vector Spaces 51 

2.1 Vector Spaces for Quantum Computing ................. 51 

20 Vechore and Vector Spaces...) 6 cee eee eee eR Re ES 52 

2.2.1 Standard Equipment: The Axioms ........6. 048485 52 
2.2.2. Optional Equipment: Inner Products, Modulus and Orthogo- 

HON secs te dee oad koe ea a eee RE ee 5D 

2.2.3 A Bigger Vector Space: R?.. 2... ee 58 

2.2.4 Preview of a Complex Vector Space: C? ............ 59 

20 Eases ior a Vector Space... a de oe eR Ewe eR owe od we 59 

2.3.1 Linear Combination (or Superposition) ............. 59 

gue The Notion ot Base ..2 28 eng eee bd wee Edw Oe Ds 60 

ao. (oordmates of Veciors «2.4.64 444 4 eee eee dase 65 

2.3.4 Independence of Basis (or Not?).............00-- 67 

Oe RWMSOWOR 2 se 6 dae he ee OG Ee OG Ca OOO wea es 68 

2.5 Higher Dimensional Vector Spaces. .............-+2-00-. 68 

200 Dore Worereises kk ke ee OR Pe a eS ee 71 

Matrices 72 

3.1 Matrices in Quantum Computing ..................0.4. 72 

oe AJERMINS se ee a we ee eee we ER Ew ta 


Ceol INOEARIOM. ..2 i448 2 eee eee Rew DRE Dw me 73 


mee gtr AIO ns ee ed ee we eh we 73 


moo ewe One 2 coe ge be bd eae eee See 74 
3.3.2 Definition of Matrix Multiplication ............... 74 
aco Product of a Vector bya Mate... .2..44 444 624 64 76 

ee WE RAO fe ee RE HS RE Ba ee OEMs 77 
3.5 Matrix Addition and Scalar Multiplication ............... 78 
3.6 Identity Matrix and Zero Matrix... ..........0. 000000 78 
oof A UIORIES 3. we oe EM CEOS we we 79 
3.7.1 Determinant of a2 x 2 Matrix ..............0.4. 79 
ate Letermmant oad so Mate 6. 2.6 ee we bw bes 80 
o.(.0 Determinant of ana * 7 Matric 2.4 . 22 es ee ae od & s 81 
o/4 Determinante of Products .. 6.4 24 jee a eee eee 81 

oe (bates Tvereee on daw ee He a Re HH Od HH Ee wD 82 
3.9 Matrix Equations and Cramer’s Rule ...............04. 83 
3.9.1 Systems of Linear Equations ................-0.4. 83 
gone Aer es Rule sc ne pee ke Bab 24 OS HE Ras 84 

4 Hilbert Space 89 
4.1 Complex Vector Spaces for Quantum Computing ........... 89 
4.1.1 The Vector Nature ofa Qubit .................. 89 
4.1.2 The Complex Nature of a Qubit................. 90 

22 ‘The <omples Veer Space, OC" «5.222 eo ee seve ee ee ows 90 
43 The Complex Inner Product: :..4.4 2.4.64 4484484444044 91 
21 Worm ce Dee cc be ew kee ee we PRR EES 93 
a.o.2- Expansion Coenen 2... <4 < vss da Yes hoe ees 94 

OA Wert Se. ce ee Sk we PAE ERG Owed gems aun 97 
Set ADEIONE oss oo de ek mY EP OS EEE OR eS aS 97 

4.4.2 Old Friends and New Acquaintances .............. 98 
4.4.3 Some Useful Properties of Hilbert Spaces............ 99 

a0 Weeye, Nop Pots: 2c ned oe he eee eed ee bee RED Re REDS 100 
4.0.1  Modelmg Quantum Systems ..<...4<¢4 40464085 100 
45.2 O is niet a Quaiinin State 4 6 wk kw ete web Se wae 103 
Bee. RE. Bowe ein ek a Se eke A Be A Pe Oe ee OK 104 

Br: cee ee oe oo ke oe we bo ee ee ee ee ee we eS 105 


5 Linear Transformations 106 


5.1 


5.2 


5.3 


5.4 


5.9 


5.6 


The 
6.1 
6.2 


6.3 


6.4 


Linear Transformations for Quantum Computing ........... 106 
5.1.1 A Concept More Fundamental Than the Matrix ........ 106 


5.1.2 The Role of Linear Transformations in Quantum Computing 107 


Definitions and Examples ...........-02.02 0000 2s 107 
5.2.1 Actions as wellas Name Changes ................ 107 
5.2.2. Formal Definition of Linear Transformation .......... 108 
The Special Role of Bases .. 2... 0... ee it 
5.3.1 Application: Rotations in Space... ..... 624s. 545685 112 
The Matrix of a Linear Transformation ................. 113 
5.4.1 From Matrix to Linear Transformation ............. 113 
5.4.2 From Linear Transformation to Matrix ............. 114 
5.4.3. Dependence of a Matrix on Basis ..............0.. 116 
5.4.4 The Transformation in an Orthonormal Basis ......... 119 
Some Special Linear Transformations for Quantum Mechanics .... 124 
5.5.1 ‘The Adjoint of a Matrix... 0.0.46 24 a eee ees 124 
Boe Watery (peas 2 2. ee ke Has ea OE wea ee 125, 
boas llerwiiism Operators « 626 642k owe RS wee eee 130 
Enter the (Quantum World . 2.2. 2 ei eee eee ee ee ee 131 
Experimental Basis of Quantum Computing 132 
The Physical Underpinning for Spin 1/2 Quantum Mechanics .... 182 
Physical Systems and Measurements ...............004. 132 
6.2.1 Quantum Mechanics as a Model of Reality ........2... ne) 
one Tbe Payee) vet. 2 kc eae BE RE ew OR Ew SD 133 
6.2.3 Electron Spin as a Testbed for Quantum Mechanics ..... 133 
A Classical Attempt at Spin 1/2 Physics ................ 134 
Dol #00 loperiect Picture wi Spi «1 cbse aaa ee OK ERE 134 
6.3.2 A Naive Quantitative Definition of Electron Spin ....... 135 
6.3.3 Spherical Representation .....52 68024 48424444: 136 
Refining Our Model: Experiment #1 .................. 137 
Gk: “Re perio os ok ok ea ee A 8 ASS ee 137 
OA The Actial Heslie-.. <6 64 644 eee td ewe EGS HHS 139 
B45 The Wosntom Keay .. 2.644 45444 4b 44 @R She os 139 
64.4 A Follow-Up to Experiment 1... 26 64 ve we ew wes 140 


6.4.5 Resulte of the Follow-Up ......2 26.244 44 248 28% 140 


6.4.6 Quantum Mechanics Lesson #1 ................. 
6.4.7 First Adjustment to the Model ...............0.. 
6.5 Refining Our Model: Experiment #2 .................. 
Oo. ‘The Deparment. <. 4.2.4.4 404 44644 446045 e424 44 
Go.2 “The Actual Resulia . . 226 vow ee wk ee ewe ew SG 
60.0 The (juanttm Reality ... 2.6 222 ee ee eee eee e wes 
6.5.4 A Follow-Up to Experiment #2 ................. 
6.5.5 Results of the Follow-Up... 2.2.2.4 65454 ee eee e es 
6.5.6 Quantum Mechanics Lesson #2 ...............-.4. 
6.5.7 Second Adjustment to the Model ................ 
6.6 Refining Our Model: Experiment #3 .................. 
O01. Whee we RO 2. ek ke eee Pe Ee ee Ew eee ee 
OO.2 The Meperinenh. 2.426 ee eed ea ee ee Edw Oe BS 
6.6.3. “Thee Actial Hesulte. 6464 4c ow bd we ke we Bh ew eS 
6.6.4. The (Guentam Reality 2... cn ee ae ee ee ae ewes 
6.6.5 Quantum Mechanics Lesson #3 .................- 
6.6.6 Third Adjustment to the Model ................. 


6.7 Onward to Formalism..........2.20..0.0 2.00. eee eee 


Time Independent Quantum Mechanics 

7.1 Quantum Mechanics for Quantum Computing ............. 

7.2 The Properties of Time-Independent Quantum Mechanics. ..... . 
7.2.1 The State Space of a Physical System... ........... 

7.3 The First Postulate of Quantum Mechanics............... 
Vol “Trait (The Siete Soe) nk ee Ae we ees 
7.3.2 The Fundamental State Space for Quantum Computing... . 


7.4 The Second Postulate of Quantum Mechanics ............. 
7.4.1 Trait #2 (The Operator for an Observable) .......... 
Tae. The Wee cc he eee se eh Se wee eee RES 

7.5 The Third Postulate of Quantum Mechanics .............. 
7.5.1 Trait #3 (The Eigenvalues of an Observable). ......... 
7.5.2 Eigenvectors and Eigenvalues ............-.....-. 
7.5.3 The Eigenvalues and Eigenvectors of Sy ............ 


7.6 Computing Eigenvectors and Eigenvalues................ 


me 


7.8 


fe, 


7.10 


vo 
Fa le 


7.13 


7.6.1 The Eigenvalues and Higenvectors of Sy ............ 
7.6.2 The Eigenvalues and Eigenvectors of S, ............ 


7.6.3 Summary of Eigenvectors and Eigenvalues for Spin-1/2 Observ- 
PEGE 6 ee ee oe oe ee ae ee OO ee ee Pe 


Observables and Orthonormal Bases .................. 
7.7.1 Trait #4 (Real Eigenvalues and Orthonormal Eigenvectors) . . 
7.7.2 Using the w- or y-Basis...........22002-02000- 
7.7.3 General States Expressed in Alternate Bases .......... 
The Completeness (or Closure) Relation ................ 
7.8.1 Orthonormal Bases in Higher Dimensions. ........... 
Tee. Teait 5 (Clase Relation) .4.0.5 644 +44 Se EY Saeed 
The Fourth Postulate of Quantum Mechanics ............. 
7.9.1 Trait #6 (Probability of Outcomes)............... 
The Fifth Postulate of Quantum Mechanics .............. 
7.10.1 Trait #7 (Post-Measurement Collapse) ............. 
Summary of What Quantum Mechanics Tells Us About a System 

Divas Braet Mota «ood ee ee we Ae ee AE ee ee 
Til Vee O00 Bree ee ha oe SKE RER ED RE EAR DEES 
7.12.2 The Adjoint of an Operator ................04. 
7.12.3 The Adjoint Conversion Rules................04. 
7.12.4 Trait #8 (Adjoint Conversion Rules) .............. 
Papectation Valiee... 2246622 b hee eee ee Bae we Re ee 
7.13.1 The Mean of the Experiments .................. 
7.13.2 Defining Expectation Value .............-.....-. 
7.13.3 Computing Expectation Value...............2.004 
7.13.4 Trait #9 (Computing an Expectation Value).......... 
7.13.5 Expectation Values in Spin-1/2 Systems ............ 


7.14 fi’s About Timé. 2.2.2 .2.6 6.5. bee ee eo ERE eS 


Time Dependent Quantum Mechanics 


8.1 
8.2 


8.3 


Time Evolution in Quantum Computing ................ 
Tie A ee ew we we ee ee Se ee wee 
Poel SOE 224 ek oe deo 84 SOS oe de Se eS 
8.2.2. From Classical Hamiltonian to Quantum Hamiltonian. .... 


The Hamiltonian for a Spin-1/2 System... .............. 


8.4 


8.5 


8.6 


8.7 


The 
9.1 
9.2 


9.3 


9.4 


9.5 


8.3.1 A Classical Hamiltonian .................2..04. 192 


Bane de (ieaiii Homltonian < «2 +424 «24644 ee de Sees 194 
The Energy Eigenkets ...........-0 0000+ ee eee eee 194 
8.4.1 Relationship Between H and S,.............004. 194 
8.4.2 Allowable Energies ............0. 00000 peas 195 
Sixth Postulate of Quantum Mechanics ...............04. 196 
8.5.1 The Schrédinger Equation ...............-..04. 196 
8.5.2 The Evolution of Spin in a Constant Magnetic Field... .. . 197 
8.5.0 Stationary States .... 6 2 6 ec ke ee ee 200 
8.5.4 General (Non-Stationary) States... ..........00.. 200 
8.5.5 General Technique for Computing Time-Evolved States... . 201 
Larner PROPeesIOh: wa ne eke eS ee ee ee ee ee 204 
8.6.1 The Time-Evolved Spin State in a Uniform B-Field ...... 204 
8.6.2 Evolution of the Spin Expectation Values............ 206 
8.6.3 Summary of Larmor Precession .............0004 210 
The End and the Begmnimg .... 262. ee he eee ee ees 211 
Qubit 212 
Bite anc Qubite ae Vector SPSkee . ak ee ee Ad eee hee ee eS 212 
Classical Computation Models — Informal Approach. ......... 212 
9.2.1 Informal Definition of Bits and Gates .............. D212 
Classical Computation Models — Formal Approach. .......... 215 
9.3.1 A Miniature Vector Space .. cc kc ce bk ee Eee 215 
9.3.2 Formal Definition of a (Classical) Bit... ........... 217 
9.3.3. Formal Definition of a (Classical) Logical Operator ...... 218 
TSUN oe ee ee Ee we ED SA OSG HES Eee! 222 
OA ‘ue Bes se hae ee eee ee ee SEE ORBLE OS 222 
Oa2 Guantom Be Vales 2. 5.24 s bk ee Se Se we ee we 224 
9.4.3 Usual Definition of Qubit and its Value. .........2... 224 
9.4.4 Key Difference Between Bits and Qubits ............ 225 
Chenin | erates (Uy) coe Sw oe ke ES Eee RS 227 
0.5.1 Delinition and Notation ....4..4 284224484 b54e85 221 
9.5.2 Case Study — Our First Quantum Gate, QNOT ........ 227 
Boe She PRR PNG. 2:6. 4.0.8-% 2 oe ee OES OS eR eS 233 


9.5.4 The Bit-and-Phase Flip Operator, Y .............. 235 


95.0 “The Hadamard (ate, . . 2 2. eee eb wee ede es 2a 


9.5.6 Phase-Shift Gates, S,T and Rg...............04. 240 

9.6 Putting Unary Gates to Use... ... 2... 0.00. eee es 241 
OG.) Basie Comversion «..444 i424 4 464 ¢4¢ 045 e244 241 
O02 ‘A pombinnig Gates. ena eee RE Ree REE ewe ews 242 

O47 “Whe Higa Sere 2 kc ek ee eR ee Se ee eee ee 245 
Oi) Tero 2 we we ew es ere ee eo Se we be OH 245 
Ge Tere Wy sack & eo Bee ERE EEE ESSE HS 245 
9.7.3 The Expectation Vector for |p) ............2000- 246 
9.7.4 Definition of the Bloch Sphere.................. 247 

10 Tensor Products 248 
10.1 Tensor Product for Quantum Computing .. . 2.2.24 8424 ees 248 
10.2 The Tensor Product of Two Vector Spaces ..............-. 249 
Dd, DS ee ke OR BEE OE Se OH 249 
10.2.2 Tensor Coordinates from Component-Space Coordinates ... 257 

10.3 Linear Operators on the Tensor Product Space. ............ 263 
Ne peparaile (eeratere «6 eee sc ed eR RE ERED EE RRS 263 
10.3.2 The Matrix of a Separable Operator .............. 265 
10.3.3 The Matrix of a General Operator. ............... 267 
Ties Wood mr THomeh. cc a ea Pa eee OR Ee CSREES 268 

11 Two Qubits and Binary Quantum Gates 269 
lid Te Jum tees Ce 6 Tee. oh ek heb eee he De RS 269 
11.2 ‘The State Space for "Two Qubits ... 2 26 eo 6 a eee Pe we Ses 270 
11.2.1 Definition of a Two Quantum Bit (“Bipartite”) System .... 270 
11.2.2 The Preferred Bipartite CBS. .................. 271 
11.2.3 Separable Bipartite States .............-2.22404. 212 
11.2.4 Alternate Bipartite Bases .................0.. Zio 
11.2.5 Non-Separable Bipartite Tensors................. 275 
11.2.6 Usual Definition of Two Qubits and their Values... ..... 276 

11.3 Fundamental Two-Qubit Logic Gates ................4.. 276 
11.2.1 Bimary Quantum Operdtord . «2.246 See be hee weds 276 
11.3.2 General Learning Example................24-. 277 
lio Niet Dipatieienient. 2 ck ce he eae ee eke See 282 


11.3.4 The Controlled-NOT (CNOT) Gate............... 284 


11.3.5 The Second-Order Hadamard Gate ............... 
11.4 Measuring Along Alternate Bases .............2.200200- 
11.4.1 Measuring Along the a-Basis .................. 
11.4.2 Measuring Along any Separable Basis. ............. 
11.4.3 The Entire Circuit Viewed in Terms of an Alternate Basis 
11.4.4 Non-Separable Basis Conversion Gates ............. 
11.5 Variations on the Fundamental Binary Qubit Gates .......... 
11.5.1 Other Controlled Logic Gates .................. 
11.5.2 More Separable Logic Gates ..............2244-. 
11.6 The Born Rule and Partial Collapse... ................ 
11.6.1 The Born Rule for a Two-Qubit System ............ 
LY Dpi-Gate Cen nn ce Se ee ee ek ee ee ee eS 
11.7.1 A Circuit that Produces Bell States... ............ 
11.7.2 A Circuit that Creates an Upside-Down CNOT ........ 
LL Tigre Thad we (bits. 2-2 ck ee ee eh ee ee eRe 
118.1 ‘Orders Tensor Products . «164655 6a ee Bad ee a eS 
lis? ‘Three (onbit Syveiemie ce hee Raw EES wee ee we 
11.8.3 Tripartite Bases and Separable States... ........... 
Lie Bind Of Dee. i ee ee HED EE RED HES 


12 First Quantum Algorithms 


Ud Eee es a RRR Re oR ow 
2.2 Siperdense COCMe 2 ck EREDAR ER EEE ERS 
12.2.1 Sendimg Information by Qubit .....46 2.644 6.65 a eas 
12.2.2 The Superdense Coding Algorithm ............... 
IZ.3 (aentum leleporiation. 2. fe ke eee ee ewe ee ees 
12.3.1 The Quantum Teleportation Algorithm ............. 
124. Introdection to Cimaninm (retest... ao en vy ew bee ee ee wd 
12.4.1 Boolean Functions and Reversibility. .............. 
12.4.2 The Quantum Oracle of a Boolean Function .......... 
Te eee 6 Pee: oo Kh AS el ee AEE GSS 
12.5.1 Definitions and Statement of the Problem ........... 
Moe Denier te Gn oe ROMS OR ERS GRE 
12.6 7 Qubite nod More Algorithme... 4.644% 2454 eed ew we eS 


13 Multi-Qubit Systems and Algorithms 


13.1 Moving Up from 2 Qubits ton Qubits ................. 368 


is.4 (yoeral Tensor Producte «2.44.04 ¢8¢ee252¢ea2de 6444 368 
13.2.1 Recap of Order-2 Tensor Products. ............... 368 
13.2.2 Recap of Order-3 Tensor Products................ 369 
13.2.3 Higher Order Tensor Products .................-. 370 

Loe WA BYOIS. ck ac et eek EA Re le oR ee eH oto 
lads) Mecep or Three Oubite . «ns ee bee ee ed Bb ee oH 373 
13.3.2 Three Qubit Logic Gates; the Toffoli Gate ........... 374 
Te TEN. ekg ee dh te a ee ee ee LS ew 377 
Le 9h Gt Lowi ees ow Ba ee eS 379 
13.3.6 Oracles tor a2. Qubit Fumctione . . 246. we ee wee ee ws 385 

13.4 Significant Deterministic Speed-Up: The Deutsch-Jozsa Problem... 386 
13.4.1 Deutsch-Jozsa Algorithm. ................00.-. 387 
13.4.2. Quantum vs. Classical Time Complexity ............ 392 
13.4.3 Alternate Proof of the Deutsch-Jozsa Algorithm ........ 395 

13.5 True Non-Deterministic Speed-Up: The Bernstein-Vazirani Problem . 398 
13.5.1 The Bernstein-Vazirani Algorithm ................ 399 

13.6 Generalived Born Rule . . 2s ee ee ee 402 
13.6.1 Trait #15” (Generalized Born Rule for (n + m)th order States) 402 

13.7 Towards Advanced Quantum Algorithms ................ 404 

14 Probability Theory 405 

14.1 Probability in Quantum Computing. ...4 2622206565444. 405 
14.1.1 Probability for Classical Algorithms. .............. 405 
14.1.2 Probability for Quantum Algorithms .............. 405 

14.2 The Essential Vocabulary: Events vs. Probabilities .......... 406 
DADA TOVONGe eck ne ee ee ee ORES OS ELE ES 406 
Me Wee es eo SS orth A eee ES ee re 407 

14.3 “The Gust Goin Filip 24. ec ce ebb eee eee eens ead 5 A407 

14.4 Experimental Outcomes and the Sample Space. ............ 408 
Ded ROR os ek BH oe GK Se OEE EASES ed 409 
14.4.2 Requirements when Defining Outcomes ............. 410 
14.4.3 An Incorrect Attempt at Defining Outcomes .......... 411 
Ae ee a be ee eS Ae eS EE eS All 


144.6 The Baimple Space .2- 54 4542 b see EH EEE EDS 412 


WA et ROOTS: ce ke SS ald SE RE wR bd ewe Bo 413 


14.5 Alternate Views of the Ten Qubit Coin Flip .............. 414 
14.6 Mod-2 Vectors in Probability Computations for Quantum Algorithms 416 
14.6.1 Ordinary Linear Independence and Spanning. ......... 416 
14.6.2 Linear Independence and Spanning in the Mod-2 Sense .... 417 
14.6.3 Probability Warm-Up: Counting ................ 420 
14.7 Fundamental Probability Theory .................... 422 
MA7L Te BO ois se ho ee OE ee EG eK 422 
14.7.2 Assigning Probabilities in Finite, Equiprobable, Sample Spaces 422 
14.8 Big Theorems and Consequences of the Probability Axioms. .... . 425 
Dae NS cs. oe oe ee ee Ee eRe eee RS 425 
14.8.2 Conditional Probability and Bayes’ Law ............ 426 
14.8.3 Statistical Independence ..............-....4-. 427 
1484 Other Forme and Hxamples 2... 2c ce bee e ee deeees 428 
14.9 Wedge and Vee Notation and Lecture Recap .............. 430 
12.) Dont vente 2. a Pew kee ee ee ee Ob we aes 430 
14.9.2 Partition of the Sample Space by Complements. ....... 430 
1490 eyed Lae we She eRe Ree eee ee eS 431 
14.9.4 Statistical Independence .................24.. 431 
14, 10Applcation to Deuteri-lozsh og ck ha a eR ORE RS 431 
14.10.1Sampling with Replacement ................... 432 
14.10.2 Analysis Given a Balanced f................00.-. 432 
14.10.3 Completion of Sampling with Replacement for Unknown f .. 434 
14.10.4Sampling without Replacement ................. 434 
14.11A Condition for Constant Time Complexity in Non-Deterministic Al- 
OO ee ee eA AER eee eS eS 438 
14.11.1 Non-Deterministic Algorithms .................. 438 
14.11.2 Preview of Time Complexity — An Algorithm’s Dependence on 
OM ne ee Be oe Reece ee ee we ep eB ene 2k 438 
14.1 Le Looping Alsorithiie.. «5 4 « «4 eee wae be we eRe EW 439 
14.11.4 Probabilistic Algorithms with Constant Time Complexity... 4389 
14.11.5A Constant Time Condition for Looping Algorithms. .... . 439 
15 Computational Complexity 442 
15.1 Computational Complexity in Quantum Computing .......... 442 


15.2 An Algorithm’s Sensitivity to Sige. . 2 6.42 ee ee ew eS 442 


15.2.1 Some Examples of Time Complexity .............. 442 


15.2.2 Time Complexity vs. Space Complexity ............ 444 
12 INGCAO stg ok ae Sh a ee ee EEE 444 
Vo Bie 4) (Sr 2 ne hee a 8 SS Ba OEE ee eS 445 
15.3.1 Conflicting Ways to Measure an Algorithm ........... 445 
lpa2 Delimiion OF B90 2 one ek ee Pe ES we ee 446 
15.3.3 Common Terminology for Certain Big-O Growth Rates ... 446 
15.3.4 Factors and Terms We CanIgnore ............... 4A7 
De A SE nk i ee eo ee te ep Se ee be wee Se ee es 449 
Ia) Tenens OF ar «x 8 oe Ha ek 4 EER we RD we SE wy 449 
ee SE es ee be ee eee he ee ee wee 8 eS cee ed es 449 
13.0 LLittie-a Growth. bo eee eR RA RE ee 450 
We ee ee, TAGE co ee a ee OS Bee Hw bo Hw bw OH Bs 450 
Te I ea eS eee eee bed eed a ade SERS 450 
16 Computational Basis States and Modular Arithmetic 451 
16.1 Different Notations Used in Quantum Computing ........... 451 
16.2 Notation and Equivalence of Three Environments ........... 451 
16.2.1 First Environment — n-qubit Hilbert Space, Hy ....... 451 
16.2.2 The Second Environment: The Finite Group (Z2)” ...... 453 
16.2.3 The Third Environment: The Finite Group Zan with ® Arith- 
THETIC cg ds ke GE hoe kod ce wm Ra we be Be we DAG A 457 
16.2.4 Interchangeable Notation of Hin), (Z2)” and (Zon,@®) ... . 459 
17 Quantum Oracles 462 
17.1 Higher Dimensional Oracles and their Time Complexity. ....... 462 
17.2 Simplest Oracle: a Boolean Function of One Bit ............ 463 
17.1 Lowe and Tmiial Remarks «gn cc ee OEE EERE RE CH 463 
17.2.2 A Two-Qubit Oracle’s Action on the CBS ........... 465 
Ii2.0 Case Si elt FSH 1 keh ea ae ee ES Eads 465 
ioe. tae Shes ee FS ie ee eS eR EE eee ORS 466 
17.2.5 Remaining Cases #3 and #4 .......-- 0252 ee eee 467 
17.3 Integers Mod-2 Review ..........-..0. 000022 eee aee 468 
17.3.1 The Classical f at the Heart of an Oracle ........... 468 
173.2. Mod-2 Notetion for GQ, 1". ow ee be ee hae Rw s 468 


17.3.6 @ Inside the Ket . ...2.5 28.8 b ewe bee ee 469 


17.4 Intermediate Oracle: f is a Boolean Function of a Multiple Input Bits 470 


VAN As ela eel eee deed Shedd eed eee hs 
17.4.2 An (n+ 1)-Qubit Oracle’s Action on the CBS. ........ 
i280 Alyse Opiore eC 2a ete ee bb eed we aad 
17.4.4 Analyzing Uy for Generale .. 1... 2.20.0... ...000. 


470 


A72 


17.5 Advanced Oracle: f is a Multi-Valued function of a Multiple Input Bits 475 


TY.) (enti amd Votnbmlary. 2 2 a en be ee eee ee dd ewe Re 

17.5.2 Smallest Advanced Oracle: m = 2 (Range of f C Z,2) 

17.5.3 The 4 x 4 Sub-Matrix fora Fixed .............. 

17.5.4 Easy Way to “See” Unitarity ..............2 004 

17.5.5 The General Advanced Oracle: Any m (Range of f C Zam) 
17.6 The Complexity of a Quantum Algorithm Relative to the Oracle... . 


18 Simon’s Algorithm for Period-Finding 


18.1 The Importance of Simon’s Algorithm ................. 
Ve Perr ee oe Aer ae Se ee A ee a ee Ye et ee we 
Tig Ceding Perot 2 kk RR ERE Oe eee S 
PO a POR ge Shs eee ee eee tee ee Es 
(Soe sees POO 6 oh ee es eS we AYE 
IS2A I-to-l, 2-to-1 ond t-to-l 5 ww ee eww eRe ee 
Te eee Pee ck ee ee eS ERS Eh OYE EEE ESS 
18.4 Simon’s Quantum Circuit Overview and the Master Plan ....... 
1OAd Viet WO 2.63 4G SR RHEE ES ERO EEE ESS 
oe SU 6g ne 6 ee Pe BY ee PRY ee EE 
In The (ean Breabeen 22k ow Sh ke Oe eee we OSS 
18.6 Circuit Analysis Prior to Conceptual Measurement: Point B ..... 
18.6.1 Hadamard Preparation of the A Register ........... 
18.6.2 The Quantum Oracle on CBS Inputs .............. 
18.6.3 The Quantum Oracle on Hadamard Superposition Inputs . . . 
18.6.4 Partitioning the Domain into Cosets .............. 
18.6.5 Rewriting the Output of the Oracle’s B Register ....... 
18.7 Analysis of the Remainder of the Circuit: Measurements ....... 
18.7.1 Hypothetical Measurement of the B Register ......... 
18.7.2 Effect of a Final Hadamard on A Register .......... 


475 
475 
475 
481 
481 
483 


484 
484 
485 
485 
485 
486 
489 
491 
492 
492 
493 
494 
494 
494 
495 
497 


18.7.3 The Orthogonality of A Register Output Relative to the Un- 


bee Pern eb ee eh oa Se eh ee hw eee Se 503 
18.7.4 Foregoing the Conceptual Measurement. ............ 505 
18.8 Circuit Analysis Conclusion ............-. 0000002 eae 506 
Lee Bn Se Aw ks ok a eR RE A & we A 507 
18.9.1 Producing n — 1 Linearly Independent Vectors ........ 507 
16.9.2 The Algorithm .. .4 0k ee ek eae ee ewe 508 
Ine Sirens Benaver ;..6 «ach who ee VY wee YO SE SS e wo 510 
18.10Time Complexity of the Quantum Algorithm. ............. 512 
18.10.1 Producing n—1 Linearly-Independent w;, in Polynomial Time 
= PEE on ek ck SE we A Re Rome ee HR 512 
18.10.2 Proof of Theorem Used by Argument 1............. 512 
IS.10 3 hairy Of Areumenmt ) sk he RR RE ER OH 517 
18.10.4 Producing n —1 Linearly-Independent w;, in Polynomial Time 
= AYiiniene 2. ke ke ke aE Re ee RR RK 518 
18.10.5 Proof of Theorem Used by Argument2............. 518 
18.10, 0Summary of Argument 2... 6 ke ee Re RS 521 
18.10.7 Discussion of the Two Proofs’ Complexity Estimates ..... 521 
18.11 The Hidden Classical Algorithms and Their Cost ........... 522 
18.11.11 Unaccounted for Bieps .. 664 2b bee eee eee ee 522 
18.12Solving Systems of Mod-2 Equations .................-. Do 
18.12.1 Gaussian Elimination and Back Substitution. ......... 523 
1812.2 Gaussian Fimminaiion . 2 6 ek ee RR 524 
1 T23 eck SSE oe ee ee eee 528 
18.12.4 The Total Cost of the Classical Techniques for Solving Mod-2 
OOS 26k eS AEE SE SEER SAYER SAE SRE S 529 
18.13Applying GE and Back Substitution to Simon’s Problem ....... 529 
15.13.) Linear Independence. 2.22 cb koe ew Eee ed 530 
18.13.2 Completing the Basis with an nth Vector Not Orthogonal toa 531 
18.13.3 Using Back-Substitution to Close the Deal ........... 534 
18.13.4 The Full Cost of the Hidden Classical Algorithms ...... . 534 
Ie.14Adjusted Aleorithin.. 64.5404 4406 b 4 owe Re wwe EES 534 
18.14.1 New Linear Independence Step .............-...4.-. 534 
18.14.2 New Solution of System Step ..............20.-. 537 
18.14.3 Cost of Adjusted Implementation of Simon’s Algorithm .... 537 


18.15Classical Complexity of Simon’s Problem ................ 538 


18.15.1 Classical Deterministic Cost ................00.% 
18.15.2 Classical Probabilistic Cost ...........0.2.2..24. 


19 Real and Complex Fourier Series 


hee | 
heey 


19.3 


19.4 


10.5 


20 The 
20.1 
20.2 


20.3 


20.4 


The Classical Path to Quantum Fourier Transforms .......... 


Periodic Functions and Their Friends ...........2....... 


19.2.1 Periodic Functions over R ............. 2.000004 
19.2.2 Functions with Bounded Domain or Compact Support 

19.2.3 The Connection Between Periodicity and Bounded Domain . . 
The Real Fourier Series of a Real Period Function ........... 
10.2.1 Denne: 2 6 hE RDER HARE ARES EES 
19.3.2 Interpretation of the Fourier Series ............... 
19.3.3 Example of a Fourier Series .. 2... 6 6 bes bb ee eS 
The Complex Fourier Series of a Periodic Function .......... 
AA, CO cn ek hE ESKER ESSERE OX 
19.4.2 Computing The Complex Fourier Coefficients ......... 
Periods and Frequencies .......... 0.0. ee eee ee ns 
19.5.1 The Frequency of any Periodic Function ............ 
ee rary Pree) --5. cn ow eh oo ee OSES ASE ewe 
19.5.3 Angular Preueney «4. os dew eG ewe dd ewe dS euns 


Continuous Fourier Transform 

Prom Series Tramehorm, . 5. 6 veh be See wee www See oS 
Motivation and Definitions ............ 0.0000 epee 
20.2.1 Non-Periodic Functions... 2... 6.25 ee bee ee ee 
20.2.2 Main Result of Complex Fourier Series ............. 
20.2.3 Tweaking the Complex Fourier Series .............. 
20.2.4 Definition of the Fourier Transform ............... 
20.2.5 The Inverse Fourier Transform. ..............004 
20.2.6 Real vs. Complex, Even vs. Odd ..............0.4. 
20.2.7 Conditions for a function to Possess a Fourier Transform .. . 
eprint CMG. ew Ys Cee Ee we Ee eee 
20.3.1 Example 1: Rectangular Pulse.................. 
2U.c.2 EHeepnople 2: Gaussiati 2 ce ee ee eb ee es 
Interlode: The Delia Function 2... 652854044 See oe es 
20.4.1 Characterization of the Delta Function ............. 


20.5 


20.6 


20.7 
20.8 


20.9 


21 The 
21,1 
21 


21.3 
21.4 


21.5 
21.6 


pay | 


20.4.2 The Delta Function as a Limit of Rectangles .......... 
20.4.3 The Delta Function as a Limit of Exponentials......... 
20.4.4 Sifting Property of the Delta Function ............. 
Fourier Transforms Involving 6(7) 2... 2... ee 
20.5.1 Example o: A Constait 26 fs bk ek ee wee ew 
20.6.2 Example 4: A Cosine ..... 464448 be ede es 
205.0 Eeniple Gi A Sime. 2 u.< ba ee Here ed OE we eo 
Properties of the Fourier Transform ................... 
20.6.1 Translation Invariance ..............-.2.-.2.-000. 
20.6.2 Plancherel’s Theorem ..< 62424 4446645 444 b4@ 44 4% 
ae SO. as ce eve ce ee Ce eG Se ee eS Swe eas 
20.6.4 The Convolution Theorem ................-44. 
Period and Frequency in Fourier Transforms .............. 
PPI. oe ee Ee ee eRe ERE EES HEX 
20.8.1 What’s the FT Used For? ...............2000- 
20.82 The Uncertainty Principle ... 2... 2 cee eee ee es 


PWIOWEMS ob 2 Ewe he EERE ES wR RES ee eee 


Discrete and Fast Fourier Transforms 

Prom Continnons to Discrete... ... 42464 ¢54¢45 soa bbe eas 
NWiotivatiog anc. Dennis ss we ag Kh REE we RE eR EEE So 
21.2.) Pumetone Mapping Za —) G22. s eee see ewe ae wes 
2122 Deni tie Pry «otek eet ee eh bee ees eed d 
Matrix Representation of the DFT .............-.0200- 
Pere en ce eo oe eA ee ees Oe eee 
21.4.1 Convolution of Two Vectors ...........0.02. 0004 
21.4.2 Translation Invariance (Shift Property) for Vectors ...... 
21.4.3 Computational Complexity of DFT... 6. ee ee ee we 
Period and Frequency in Discrete Fourier Transforms ......... 
A Cost-Benefit Preview of the FFT ............-..5004. 
Ot SeGCee oe ko oe eh oe ee ee Ss OY Se ees 
Bide RO 25 cee Cw AEG BG Ewe 2 Se eG 
ecmmave Pound br Pn ek ek Se ee Ee Oe Ee SS 
21.7.1 Splitting 7 into f°" and PP 2. cn ee wie ee ee eee HS 
21.7.2 Nth Order DFT in Terms of Two (N/2)th Order DFTs . . . 


ALS 
21.7.4 


Danielson-Lanczos Recursion Relation ..........2.2.2~. 


Code Samples: Recursive Algorithm. .............. 


21.8 A Non-Recursive, N log N Solution that Defines the FFT ...... 


22 The 
a | 
eos 


22.3 


22.4 


a | 
Z1L..2 
ol 
21.8.4 
21.8.5 
21.8.6 


The High-Level FFT Method.................. 
PGA Heversel a. oc wd ee eB ee we el eh Sw ee ee we S 
Rebuilding from the Bit-Reversed Array ............ 
PSEA MEO GION: oc x oe ee ee a ed Pe eee ee wd ee wR 
Pomel Compress . ke ee eRe eRe Bd 


pottware Testimg . 2 22 ee ke ee ee 


Quantum Fourier Transform 
From Classical Fourier Theory tothe OFT .............. 


Wehinitions. 2. 6 ee gee ee ee oe a BE a Se a ee eS 


ye ate | 
Deities 
ae as 
22.2.4 


From om to Baton Bodh oss Hee ce Ri ae ae ee A OS ee 
Approaches to Operator Definition ............... 


Review of Hadamard Operator ................. 
Detnine the eo 2 6664 6 Mh be Oe HEE REELED: 


Features of the OF J . 2. oe ee ee ee ee eee ee es 


eae aI 
26nd 
Dau 


Bi Os hoe ek oS AES one t 
A Comparison between QFTJT and H .............. 


The Quantum Fourier Basis .................-.-. 


The OFT (weit: ee ee ee EEE ee wee ee ews 


22.4.1 
22.4.2 
22.4.3 
22.4.4 


PRAM GRE SEE REARS EAR AGED COED eS 
The Math that Leads to the Circuit: Factoring the OFT .. . 
The OFT Circuit from the Math: n = 3 Case Study ..... 
The OFT Circuit from the Math: General Case. ....... 


22.5 Computational Complexity of OFT .............020480. 


23 Shor’s Algorithm 
23.1 The Role of Shor’s Algorithms in Computing. ............. 


23.1.1 
23.1.2 
23.1.3 


Context for The Algorithms ................... 
Period Finding and Factoring « ..4. 466+ eee eee aoe wo 
The Period Finding Problem and its Key Idea ......... 


Boe WjeChive PeMOdiGty 2 oe ee oe we ee ee SL ered ewe 


23.2.1 
Diovtect 


Pumetions oF thie Integers 2... 2 se ee eh ee ke Se ee 


Functions of the Group Zyy . . 2. 1 ee 


23.2.3 Discussion of Injective Periodicity ................ 643 


23.0 plore Penavicity Prope. . 622. bbe eee ee ew he wees 644 
23.3.1 Definitions and Recasting the Problem ............. 646 
23.3.2 The Zw —- (Z2)"—- CBS Connection .............. 648 
23.4 Shor’s Quantum Circuit Overview and the Master Plan. ....... 649 
eo EG CN nc ek Bk ee eR OR BO ke ew 649 
Dae TOT «ore oe bk ew ao ered SA ee ee ed ee BS 649 
2a. The Cire Breakdown... i444 0548405 es ee es Bee eS 651 
23.6 Circuit Analysis Prior to Conceptual Measurement: Point B .... . 651 
23.6.1 The Hadamard Preparation of the A register. ......... 651 
oo0.2 The Quatiium (race: 2c sve eo eS ce hE wee we eS 652 
23.6.3 The Quantum Oracle on Hadamard Superposition Inputs... 653 
23.7 Fork-in-the Road: An Instructional Case Followed by the General Case 653 
23.8 Intermezzo — Notation for GCD and Coprime ............. 654 
23.8.1 Greatest Common Divisor . . 2. 2 2 ke ee 654 
23.8.2 Coprime (Relatively Prime) ..................-. 654 
23.9 First Fork: Easy Case (a|.N) Si es coe Be So ee Sak be eee: 655 
23.9.1 Partitioning the Domain into Cosets .............. 655 
23.9.2 Rewriting the Output of the Oracle... ............ 656 
23.9.3 Implication of a Hypothetical Measurement of the B register 
Rs bok eed Re ede eed be hed be wee EGS 657 
23.9.4 Effect of a Final OFT on the A Register. ........... 659 
23.9.5 Computation of Final Measurement Probabilities (Hasy Case) 661 

23.9.6 STEP I: Identify a Special Set of a Elements, C = {yc} 225 of 
Certain Measurement Likelihood ................ 661 

23.9.7 Step II: Observe that Each cm Will be Measured with Equal 
TRUS. 2... oh ee 4-4 hw Sele we eB EM a ee E 662 

23.9.8 Step III: Prove that a Random Selection from [0, a — 1] will be 
Coprime-to-a 50% of the Time... .............0.4. 664 

23.9.9 Step IV: Observe that a y = cm Associated with c Coprime-to-a 
Will be Measured with Probability 1/2.............. 664 

23.9.10Step V: Observe that y, Associated with c Coprime to a Will 
be Measured in Constant Time ................. 666 
23.9.11 Algorithm and Complexity Analysis (Easy Case) ....... 667 
23.10Second Fork: General Case (We do not Assume a|N De shaw ee 669 


23.10.1 Partitioning the Domain into Cosets .............. 670 


23.10.2 Rewriting the Output of the Oracle’s output... . 2.2.2... 672 


23.10.3 Implication of a Hypothetical Measurement of the B register 
OS 2 22 oe eb hho eA BOS AES eS 673 


23.10.4 Effect of a Final OFT on the A Register. ........... 675 
23.10.5 Computation of Final Measurement Probabilities (General Case)677 
23.10.6STEP I: Identify (Without Proof) a Special Set of a Elements, 


C= {yoo of High Measurement Likelihood. ......... 678 
23.10.7STEP II: Prove that the Values in, C = {ye} Have High 
Measurement Likelihood ..................2-4-. 683 
23.10.8STEP III: Associate {yo}eag with {c/a}%25 oo ae eneae 694 
23.10.9STEP IV: Describe an O(log?N) Algorithm that Will Produce 
C76 FONG et hk HE oe eee Oe ESS oe eG 697 
23.10.16tep V: Measure y, Associated with c Coprime to a in Constant 
BNE sk oe ooh Ow A Brod Ha OBA eg ee 699 
23.10.1 Algorithm and Complexity Analysis (General Case) ...... 708 
23.10.1#pilogue on Shor’s Period-Finding Algorithm ......... 710 
24 Euclidean Algorithm and Continued Fractions 711 
24.1 Ancient Algorithms for Quantum Computing. ............. 711 
24.2 The Euclidean Algorithm. .............2.. 02000000] 711 
24.2.1 Greatest Common Divisor . i... 244 oe sb eee EHS yo! 
Pane Lane DVO ke be ee we ee eS ee Ee eS ee we SUH 712 
24.2.3 The Euclidean Algorithm, EA(P, Q) .............. 112 
26.28 Te AW 6p OE AREER Owe ED EOE SE EES ralbs 
Poo Lime Womplodiy o BA as < 6 ka 4 kM ENR ORE OR 714 
24.3 Convergents and the Continued Fraction Alorithm........... 715 
24.5.1, Coomtiied Pragtiong 244.4 4.44644 84344 44S oR EES 715 
24.3.2 Computing the CFA a,s Using the EA if x is Rational .... 716 
24.3.3 A Computer Science Method for Computing the a;s of Contin- 
Wed Peaeuone: 2c cece ee ee oe Ca Se OAR Pe ee (aw, 
24.3.4 Convergents of a Continued Fraction ..........2.... 718 
24.3.5 An Algorithm for Computing the Convergents {n,/d,} ... . 722 
24.3.6 Easy Properties of the Convergents ............... T22 
24.3.7 CFA: Our Special Brand of Continued Fraction Algorithm .. 723 
25 From Period-Finding to Factoring 725 
25.1 Period Finding to Factoring to RSA Encryption ............ 725 


25.2 The Problem and Two Classically Easy Cases ...........0.. 

25.0 2 Sobiciemt COMO 2. osu wee eb Ee ee wees 

25.4 A Third Easy Case and Order-Finding in Zy ............. 

25.5 Using the Order of y to Find the x of our “Sufficient Condition” 

25.6 The Time Complexity of f(a) = y® (mod M)............. 

25.7 The Complexity Analysis... .........0. 0000020 eeee 
2o.t.) “complexity or Btep |. es a ee ee ee ee ee 
go-1-2 (ooiplemiig ai Step 2 . 424 es hoe Re EER eS EHS 
25.7.3 Absolute Complexity of Shor’s Factoring ............ 


List of Figures 


List of Tables 


736 


TAL 


Chapter 0 


Introduction 


0.1 Welcome to Volume One 


0.1.1 About this Volume 


This book accompanies CS 83A, the first of a three quarter quantum computing 
sequence offered to sophomores during their second or third years at Foothill Com- 
munity College in Los Altos Hills, California. This first course focuses on quantum 
computing basics under the assumption that we have noise-free quantum components 
with which to build our circuits. The subsequent courses deal with advanced algo- 
rithms and quantum computing in the presence of noise, specifically, error correction 
and quantum encryption. 


This is not a survey course; it skips many interesting aspects of quantum comput- 
ing. On the other hand, it is in-depth. The focus is on doing. My hope is that some 
of you will apply the “hard” skills you learn here to discover quantum algorithms 
of your own. However, even if this is the only course you take on the subject, the 
computational tools you learn will be applicable far beyond the confines of quantum 
information theory. 


0.1.2 About this Introduction 


This short introduction contains samples of the math symbolism we will learn later 
in the course. These equations are intended only to serve as a taste of what’s ahead. 
You are not expected to know what the expressions and symbols mean yet, so don’t 
panic. All will be revealed in the next few weeks. Consider this introduction to be a 
no-obligation sneak preview of things to come. 


Pall 


0.2 Bits and Qubits 


0.2.1 More Information than Zero and One 


Classical computing is done with bits, which can be either 0 or 1, period. 


Quantum computing happens in the world of quantum bits, called qubits. Until a 
qubit, call it |W), is directly or indirectly measured, it is in a state that is neither 0 
nor 1, but rather a superposition of the two, expressed formally as 


ly) = a0) + Bil). 


The symbol |0) corresponds to classical “0” value, while |1) is associated with a 
classical value of “71.” Meanwhile, the symbols a and ( stand for numbers that express 
how much “0” and how much “1” are present in the qubit. 


We'll make this precise shortly, but the idea that you can have a teaspoon of “0” 
and a tablespoon of “1” contained in a single qubit immediately puts us on alert that 
we are no longer in the world of classical computing. This eerie concept becomes all 
the more magical when you consider that a qubit exists on a sub-atomic level (as 
photon or the spin state of an electron, for example), orders of magnitude smaller 
than the physical embodiment of a single classical bit which requires about a million 
atoms (or in research labs as few as 12). 


That an infinitely small entity such as a qubit can store so much more information 
than a bulky classical bit comes at a price, however. 


0.2.2 The Probabilistic Nature of Qubits 
A Classical Experiment 


If 100 classical one-bit memory locations are known to all hold the same value — call 
it x until we know what that value is — then they all hold x = 0 or all hold x = 1. If 
we measure the first location and find it to be 1, then we will have determined that 
all 100 must hold a 1 (because of the assumption that all 100 locations are storing 
the exact same value). Likewise, if we measure a 0, we’d know that all 100 locations 


ah 


Figure 1: After measuring one location, we know them all 


wa} 


22 


contain the value 0. Measuring the other 99 locations would confirm our conclusion. 
Everything is logical. 


A Quantum Experiment 


Qubits are a lot more slippery. Imagine a quantum computer capable of storing 
qubits. In this hypothetical we can inspect the contents of any memory location in 
our computer by attaching an output meter to that location and read a result off the 
meter. 


Let’s try that last experiment in our new quantum computer. We load 100 qubit 
memory locations with 100 identically prepared qubits. “Identically prepared” means 
that each qubit has the exact same value, call it |W). (Never mind that I haven’t 
explained what the value of a qubit means; it has some meaning, and I’m asking you 
imagine that all 100 have the same value.) 


Next, we use our meter to measure the first location. As in the classical case 
we discover that the output meter registers either a “0” or a “1.” That’s already a 
disappointment. We were hoping to get some science-fictiony-looking measurement 
from a qubit, especially one with a name like “|7).” Never mind; we carry on. Say 
the location gives us a measurement of “7.” 


Summary to this point. We loaded up all 100 locations with the same qubit, 
peered into the first location, and saw that it contained an ordinary “1.” 


What should we expect if we measure the other 99 locations? Answer: We 
have no idea what to expect. 


ree 


Figure 2: After measuring one location, we don’t know much 


Some of the remaining qubits may show us a “/,” others a “0,” all despite the fact 
that the 100 locations initially stored identical qubit values. 


This is disquieting. To our further annoyance, 
e the measurement of each qubit always produces either a “0” or a “1” on our 


meter, despite our having prepared a state |q) = a|0)+ (11), falsely claiming 
to be some exciting combination of the two classical values, 


23 


e the measurement will have permanently destroyed the original state we pre- 
pared, leaving it in a classical condition of either “0” or “1,” no more magical 
superposition left in there, 


e as already stated we know nothing (well, almost nothing, but that’s for another 
day) about the measurement outcomes of the other 99 supposedly identically 
prepared locations, and 


e most bizarre of all, in certain situations, measuring the state of any one of these 
qubits will cause another qubit in a different computer, room, planet or galaxy 
to be modified without the benefit of wires, radio waves or time. 


0.2.3. Quantum Mechanics — The Tool that Tames the Beast 


Such wild behavior is actually well managed using quantum mechanics, the mathe- 
matical symbol-manipulation game that was invented in the early 20th century to 
help explain and predict the behavior of very, very small things. I cited the trucu- 
lent nature of qubits in this introduction as a bit of a sensationalism to both scare 
and stimulate you. We can work with these things very easily despite their unusual 
nature. 


The challenge for us is that quantum mechanics — and its application to informa- 
tion and algorithms — is not something one can learn in a week or two. But one can 
learn it in a few months, and that’s what we’re going to do in this course. I’ve pre- 
pared a sequence of lessons which will walk you through the fascinating mathematics 
and quantum mechanics needed to understand the new algorithms. Because it takes 
hard, analytical work, quantum computing isn’t for everyone. But my hope is that 
some among you will find this volume an accessible first step from which you can go 
on to further study and eventually invent quantum algorithms of your own. 


0.2.4 Sneak Peek at the Coefficients a and 3 


So as not to appear too secretive, I’ll give you taste of what a and 6 roughly mean 
for the state |W). They tell us the respective probabilities that we would obtain a 
reading of either a “0” or a “1” were we to look into the memory location where |) 
is stored. (In our quantum jargon, this is called measuring the state |wW).) If, for 
example, the values happened to be 


then the 
0 would be (1/2)? = 4 = 25% 
1 would be (73/2)? = 3 = 75% 


probability of measuring 
_ 


24 


In other words, if a qubit with the precise and definitive value 


Iv) = a0) + BI) 


is sitting in a one-(qu)bit memory location, and we look at that location, we will 
actually “see” a 


0 with probability |a|? and 
1 with probability ||’. 


This is far from the whole story as we’ll learn in our very first lecture, but it gives 
you a feel for how the probabilistic nature of the quantum world can be both slippery 
and quantitative at the same time. We don’t know what we’ll get when we query a 
quantum memory register, but we do know what the probabilities will be. 


0.3 The Promise of Quantum Computing 


0.3.1 Early Results 


If we’re going to expend time studying math and quantum mechanics, we should 
expect something in return. The field is evolving rapidly, with new successes and 
failures being reported weekly. However, there are a few established results which are 
incontrovertible, and they are the reason so much effort is being brought to bear on 
quantum computer design. 


Of the early results, perhaps the most dramatic is Shor’s period-finding algorithm, 
and it is this that I have selected as the endpoint of our first volume. It provides 
a basis for factoring extremely large numbers in a reasonable time when such feats 
would take classical computers billions of years. The applications, once implemented 
are profound. However the consequences may give one pause; network security as we 
know it would become obsolete. 


Fortunately, or so we believe, there are different quantum techniques that offer 
alternatives to current network security and which could render it far more secure 
than it is today. (These require additional theory, beyond the basics that we learn in 
volume 1 and will be covered in the sequel.) 


There are also less sensational, but nevertheless real, improvements that have 
been discovered. Grover’s search algorithm for unsorted linear lists, while offering 
a modest speed-up over classical searches, is attractive merely by the ubiquity of 
search in computing. Related search techniques that look for items over a network 
are being discovered now and promise to replicate such results for graphs rather than 
list structures. 


Quantum teleportation and super dense coding are among the simplest applica- 
tions of quantum computing, and they provide a glimpse into possible new approaches 
to more efficient communication. We'll get to these in this volume. 


25 


0.3.2 The Role of Computer Scientists 


Quantum computers don’t exist yet. There is production grade hardware that appears 
to leverage quantum behavior but does not exhibit the simple qubit processing needs 
of the early — or indeed most of the current — quantum algorithms in computer 
science. On the other hand, many university rigs possess the “right stuff’ for quantum 
algorithms, but they are years away from having the stability and/or size to appear 
in manufactured form. 


The engineers and physicists are doing their part. 


The wonderful news for us computer scientists is that we don’t have to wait. 
Regardless of what the hardware ultimately looks like, we already know what it 
will do. That’s because it is based on the most fundamental, firmly established and — 
despite my scary sounding lead-in — surprisingly simple quantum mechanics. We know 
what a qubit is, how a quantum logic gate will affect it, and what the consequences of 
reading qubit registers are. There is nothing preventing us from designing algorithms 
right now. 


0.4 The Two Sides of Quantum Computer Science 


Given that we can strap ourselves in and start work immediately, we should be clear 
on the tasks at hand. There are two. 


e Circuit Design. We know what the individual components will be, even if 
they don’t exist yet. So we must gain some understanding and proficiency in 
the assembly of these parts to produce full circuits. 


e Algorithm Design. Because quantum mechanics is probabilistic by nature, 
we'll have to get used to the idea that the circuits don’t always give us the 
answer right away. In some algorithms they do, but in others, we have to send 
the same inputs into the same circuits many times and let the laws of probability 
play out. This requires us to analyze the math so we can know whether we have 
a fighting chance of our algorithm converging to an answer with adequate error 
tolerance. 


0.4.1 Circuit Design 
Classical 


Classical logic gates are relatively easy to understand. An AND gate, for example, 
has a common symbol and straightforward truth table that defines it: 


26 


x y ry 
0 1 0 
1 0 0 
1 1 1 


You were introduced to logic like this in your first computer science class. After about 
20 minutes of practice with various input combinations, you likely absorbed the full 
meaning of the AND gate without serious incident. 


Quantum 


A quantum logic gate requires significant vocabulary and symbolism to even define, 
never mind apply. If you promise not to panic, Ill give you a peek. Of course, you'll 
be trained in all the math and quantum mechanics in this course before we define 
such a circuit officially. By then, you'll be eating quantum logic for breakfast. 


We'll take the example of something called a second order Hadamard gate. We 
would start by first considering the second order qubit on which the gate operates. 
Such a thing is symbolized using the mysterious notation and a column of numbers, 


Next, we would send this qubit through the Hadamard gate using the symbolism 


|e)? “| [ef | ge |e)? 


Although it means little to us at this stage, the diagram shows the qubit |)? entering 
the Hadamard gate, and another, H®? |w)”, coming out. 


Finally, rather than a truth table, we will need a matrix to describe the behavior of 
the gate. Its action on our qubit would be the result of matrix multiplication (another 


Zk 


topic we’ll cover if you haven’t had it), 


1 1 1 1 ay a+Bp+y7+6 
Lay tS Si apg 1fa—B+y-6 

@2 25 ys a 
ee 2/1 1-1 -1 a a+B-y-6 
1-1 -1 1 } a—-Bp- y+ 


Again, we see that there is a lot of unlearned symbolism, and not the kind that can be 
explained in a few minutes or hours. We’ll need weeks. But the weeks will be packed 
with exciting and useful information that you can apply to all areas of engineering 
and science, not just quantum computing. 


0.4.2 Algorithms 


In quantum computing, we first design a small circuit using the components that are 
(or will one day become) available to us. An example of such a circuit in diagram 
form (with no explanation offered today) is 


Oa Hee A 
Us 
(actual) 
0)" A 
“x —” 
(conceptual) 
v v v 
A B C 


There are access points, A, B and C, to assist with the analysis of the circuit. When 
we study these circuits in a few weeks, we’ll be following the state of a qubit as it 
makes its way through each access point. 


Deterministic 


The algorithm may be deterministic, in which case we get an answer immediately. 
The final steps in the algorithm might read: 


We run the circuit one tume only and measure the output. 


e If we read a zero the function is constant. 


e If we read any non-zero value, the function is balanced. 


This will differ from a corresponding classical algorithm that requires, typically, many 
evaluations of the circuit (or in computer language, many “loop passes” ). 


28 


Probabilistic 


Or perhaps our algorithm will be probabilistic, which means that once in a blue moon 
it will yield an incorrect answer. The final steps, then, might be: 


e If the above loop ended aftern+T full passes, we failed. 


e Otherwise, we succeeded and have solved the problem with error prob- 
ability < 1/2? and with a big-O time complexity of O(n), i.e., in 
polynomial time. 


Once Again: I don’t expect you to know about time complexity or probability 
yet. You'll learn it all here. 


Whether deterministic or probabilistic, we will be designing circuits and their 
algorithms that can do things faster than their classical cousins. 


0.5 Perspective 


Quantum computing does not promise to do everything better than classical comput- 
ing. In fact, the majority of our processing needs will almost certainly continue to 
be met more efficiently with today’s bit-based logic. We are designing new tools for 
currently unsolvable problems, not to fix things that are currently unbroken. 


0.6 Navigating the Topics 


Most students will find that a cover-to-cover reading of this book does not match 
their individual preparation or goals. One person may skip the chapters on complex 
arithmetic and linear algebra, while another may devote considerable — and I hope 
pleasurable — time luxuriating in those subjects. You will find your path in one of 
three ways: 


1. Self-Selection. The titles of chapters and sections are visible in the click-able 
table of contents. You can use them to evaluate whether a set of topics is likely 
to be worth your time. 


2. Chapter Introductions. The first sentence of some chapters or sections may 
qualify them as optional, intended for those who want more coverage of a spe- 
cific topic. If any such optional component is needed in the later volumes 
accompanying CS 83B or CS 83C, the student will be referred back to it. 


3. Tips Found at the Course Site. Students enrolled in CS 83A at Foothill 
College will have access to the course web site where weekly modules, discussion 
forums and private messages will contain individualized navigation advice. 


Let’s begin by learning some math. 


29 


Chapter 1 


Complex Arithmetic 


1.1 Complex Numbers for Quantum Computing 


I presented the briefest of introductions to the quantum bit, or qubit, defining it as 
an expression that combines the usual binary values “0” and “1” (in the guise of 
the symbols |0) and |1)). The qubit was actually a combination, or superposition, of 
those two binary values, each “weighted” by numbers a and {, respectively, 


Iv) = a0) + Il). 


But what are these numbers a and (? If you are careful not to take it too seriously, 
you can imagine them to be numbers between 0 and 1, where small values mean less 
probable — or a small dose — and larger values mean more probable — or a high dose. 
So a particular qubit value, call it |wo), defined to be 


lo) = (-9798)|0) + (.200)|1) , 


would mean a large “amount” of the classical bit 0 and a small “amount” of the clas- 
sical bit 1. I’m intentionally using pedestrian terminology because it’s going to take 
us a few weeks to rigorously define all this. However, I can reveal something imme- 
diately: real numbers will not work for a or @. The vagaries of quantum mechanics 
require that these numbers be taken from the richer pool of complex numbers. 


Our quest to learn quantum computing takes us through the field of quantum 
mechanics, and the first step in that effort must always be a mastery of complex 
arithmetic. Today we check off that box. And even if you’ve studied it in the past, 
our treatment today might include a few surprises — results we’ll be using repeatedly 
like Euler’s formula, how to sum complex roots-of-unity, the complex exponential 
function and polar forms. So without further ado, let’s get started. 


30 


1.2 The Field of Complex Numbers 


1.2.1 The Real Numbers Just Don’t Cut It 


We can only go so far in studying quantum information theory if we bind ourselves to 
using only the real numbers, R. The hint that R is not adequate comes from simple 
algebra. We know the solution of 


a ae 
is x = +1. But make the slightest revision, 
g+il= @, 


and we no longer have any solutions in the real numbers. Yet we need such solutions 
in physics, engineering and indeed, in every quantitative field from economics to 
neurobiology. 


The problem is that the real numbers do not constitute a complete field, a term 
that expresses the fact that there are equations that have no solutions in that number 
system. We can force the last equation to have a solution by royal decree: we declare 
the number 


eS yt 


to be added to R. It is called an zmaginary number. Make sure you understand the 
meaning here. We are not computing a square root. We are defining a new number 
whose name is 2, and proclaiming that it have the property that 


ee Sl: 


7 is merely shorthand for the black-and-white pattern of scribbles, “./—1.” But those 
scribbles tells us the essential property of this new number, 2. 


This gives us a solution to the equation x? +1 = 0, but there are infinitely 
many other equations that still don’t have solutions. It seems like we would have to 
manufacture a new number for every equation that lacks a real solution (a.k.a. a real 
zero in math jargon). 


1.2.2. The Definition of C 


However, we get lucky. The number 7 can’t just be thrown in without also specifying 
how we will respond when someone wants to add or multiply it by a number like 
3 or -71.6. And once we do that, we start proliferating new combinations — called 
complex numbers. Each such non-trivial combination (i.e., one with an 7 in it) will be 
a solutions to some equation that doesn’t have a real zero. Here are a few examples 
of the kinds of new numbers we will get: 


243i, 1-i, .05+.002i, 7+V/2i, 52, —100% 


dl 


All of these numbers are of the form 


some real number + some real multiple of 2, 


Terminology 


When there is no real part (the first term in the above sum) to the complex number, 
it is called purely imaginary, or just imaginary. The last example in the list above is 
purely imaginary. 

If we take all these combinations, we have a new supersized number system, the 
complex numbers, defined by 


C = {a+bi|aeR, bER andi=v-1}. 


1.2.3. The Complex Plane 


Since every complex number is defined by an ordered pair of real numbers, (a,b), 
where it is understood that 


(a,b) © at bi, 


we have a natural way to represent each such number on a plane, whose x-axis is 
the real axis (which expresses the value a), and y-axis is the imaginary axis (which 
expresses the value b) (figure 1.1). This looks a lot like the real Cartesian plane, R?, 


(CC: the complex plane 


Figure 1.1: a few numbers plotted in the complex plane 


but it isn’t. Both pictures are a collection of ordered pairs of real numbers, but C 
is richer than R? when we dig deeper: it has product and quotient operations, both 
missing in R*. (You'll see.) For now, just be careful to not confuse them. 


32 


One usually uses z, c or w to denote complex numbers. Often we write the 7 
before the real coefficient: 


zg=aetty, c=at+ib w= utiv 


In quantum computing, our complex numbers are usually coefficients of the compu- 
tational basis states — a new term for those special symbols |0) and |1) we have been 
toying with — in which case we may use use Greek letters a or 6, for the complex 
numbers, 


IY) = a0) + BI). 


This notation emphasizes the fact that the complex numbers are scalars of the com- 
plex vector space under consideration. 


[Note. If terms like vector space or scalar are new to you, fear not. I’m not 
officially defining them yet, and we’ll have a full lecture on them. I just want to start 
exposing you to some vocabulary early.| 


Equality of Two Complex Numbers 


The criteria for two complex numbers to be equal follows the template set by two 
points in R? being equal: both coordinates must be equal. If 


Zz = x+y and w= ut, 
then 


22 => Gea and) ys vz 


1.2.4 Operations on Complex Numbers 


We define what we mean by addition, multiplication, etc. next. We already know 
from the definition of 7, that 


which tells us the answer to 7-7. From there, we build-up by assuming that the oper- 
ations x and + obey the same kinds of laws (commutative, associative, distributive) 
as they do for the reals. With these ground rules, we can quickly demonstrate that 
the only way to define addition (subtraction) and multiplication is 


(a+ib) + (c+id) = (atc) + i(b+d) 


(a+ib)(c+id) = (ac—bd) + i(ad+ be). 


33 


The rule for division can actually be derived from these, 


a+ib a+ ib c—id 
ctid (5) (=) 
_ (ac+bd) + i(be — ad) 
c+ da? 
ac + bd _bc — ad 


2 2 
eae + ae where c +d #0, 


A special consequence of this is the oft cited identity 


[Exercise. Prove it.] 


Addition (or subtraction) can be pictured as the vectorial sum of the two complex 
numbers (see figure 1.2). However, multiplication is more easily visualized when we 


| im 


(at+c)+i(b+d) 


Figure 1.2: visualization of complex addition 


get to polar coordinates, so we’ll wait a few minutes before showing that picture. 
[Exercise. For the following complex numbers, 
2+3i, 1-i, .05+.002i, 7+V2i, 52, —100:, 
(a) square them, 
(b) subtract 1 —7 from each of them, 
(c) multiply each by 1 — i, 
(d) divide each by 5, 
(e) divide each by 7, and 
(f) divide any three of them by 1 — i. | 


(Exercise. Explain why it is not necessarily true that the square of a complex 
number, 27, is positive or zero — or even real, for that matter. If z? is real, does that 
mean it must be non-negative’| 


34 


1.2.5 Cisa Field 


A number system that has addition and multiplication replete with the usual prop- 
erties is called a field. What we have outlined above is the fact that C is, like R, a 
field. When you have a field, you can then create a vector space over that field by 
taking n-tuples of numbers from that field. Just as we have real n-dimensional vector 
spaces, IR”, we can as easily create n-dimensional vector spaces over C which we call 
C”. (We have a whole lesson devoted to defining real and complex vector spaces.) 


1.3. Exploring the Complex Plane 


1.3.1 Complex Numbers as Ordered Pairs of Real Numbers 


We already saw that each complex number has two aspects to it: the real term and 
the term that has the i in it. This creates a natural correspondence between C and 
R?, 


getiy «+> (a, y). 


As a consequence, a special name is given to Cartesian coordinates when applied to 
complex numbers: the complex plane. 


[Advanced Readers. For those of you who already know about vector spaces, 
real and complex, Ill add a word of caution. This is not the same as a complex 
vector space consisting of ordered pairs of complex numbers (z, w). The complex 
plane consists of one point for every complex number, not a point for every ordered 
pair of complex numbers. | 


1.3.2 Real and Imaginary Axes and Polar Representation 


The axes can still be referred to as the x-axis and y-axis, but they are more commonly 
called the real-axis and imaginary axis. The number 7 sits on the imaginary axis, one 
unit above the real axis. The number -3 is three units to the left of the imaginary 
axis on the real axis. The number 1 + / is in the “first quadrant.” [Exercise. Look 
up that term and describe what the second, third and fourth quadrants are.] 


Besides using (x, y) to describe z, we can use the polar representation suggested 
by polar coordinates (r,@) of the complex plane (figure 1.3). 


xt+iy «<> r(cos@ + isiné) 


35 


Figure 1.3: the connection between cartesian and polar coordinates of a complex 
number 


Terminology 
x =Re(z), the “real part” of z 
y=Im(z), the “imaginary part” of z 
r=|z|, the “modulus” (“magnitude”) of z 
0=argz, the “argument” (“polar angle”) of z 
32, mi, 9002, etc. “purely imaginary” numbers 


Note. Modulus will be discussed more fully in a moment. For now, |z| = r can be 
taken as a definition or just terminology with the definition to follow. 

1.3.3 Complex Conjugate and Modulus 

The Conjugate of a Complex Number 


We can apply an operation, complex conjugation, to any complex number to produce 
another complex number by negating the imaginary part. If 


2 = Bay 
then its complex conjugate (or just conjugate) is designated and defined as 


Zz =ax-—ty, the “complex conjugate” of z. 


A common alternate notation places a horizontal bar above z (instead of an asterisk 
to the upper right), 


Geometrically, this is like reflecting z across the x (real) axis (figure 1.4). 


36 


Examples 


(35 + 8)* = 35 -— & 


) 

a 2+ V2 
(-3V2i)* = (3Vv2i 

) 


* = 1.59 


It is easy to show that conjugation distributes across sums and products, i.e., 


* 


we)" = wz" and 


(w + z)* = wr + 2*. 


These little factoids will come in handy when we study kets, bras and Hermitian 
conjugates in a couple weeks. 


[Exercise. Prove both assertions. What about quotients?] 


The Modulus of a Complex Number 


Just as in the case of R?, the modulus of the complex z is the distance of the line 
segment (in the complex plane) 0 z, that is, 


lz] = WRe(z)? + Im(z)? 


= \PrP. 


A short computation shows that multiplying z by its conjugate, z*, results in a 


37 


Figure 1.5: modulus of a complex number 


non-negative real number which is the square of the modulus of z. 


= oe = oe S&S ay 


l4| = ge 


[Exercise. Fill in any details that are not immediately apparent.] 


In these lectures you may have seen (and will continue to see) me square the 
absolute value of a number or function. You might have wondered why I bothered to 
use absolute value signs around something I was about to square. Now you know: the 
value of the number might be complex and simply squaring it would not necessarily 
result in a real — much less positive — number. The square of complex numbers are 
not normally real. (If you didn’t do the complex arithmetic exercise earlier, do it now 
to get a few concrete examples of this phenomenon.) 


1.4 Transcendental Functions and Their Identities 


The term transcendental function has a formal definition, but for our purposes it 
means functions like sinx, cosx, e”, sinh x, etc. It’s time to talk about how they are 
defined and relate to complex arithmetic. 


1.4.1 The Complex Exponential Function Part 1: Pure Imag- 
inary Case 


From calculus, you may have learned that the real exponential function, exp(x) = e”, 
can be expressed — by some authors, defined — in terms of an infinite sum, the Taylor 


38 


expansion or Taylor series, 


(oe) gn 
exp(z) = e* = eae 
n=0 


This suggests that we can define a complex exponential function that has a similar 
expansion, only for a complex z rather than a real 2. 


Complex Exponential of a Pure Imaginary Number and Euler’s Formula 


We start by defining a new function of a purely imaginary number, 70, where 6 is the 
real angle (or arg) of the the number, 


exp(i0) = e® = 


Me 
S 
3 


n! 


n=0 


(A detail that I am skipping is the proof that this series converges to a complex 
number for all real 0. But believe me, it does.) 


Let’s expand the sum, but first, an observation about increasing powers of 2. 


heal 
=i 
al 
p= -i 
w=1 
P=i 
gnt4 _ yn 


Ot a i0 6? io 
1! 2! 31°! 
i0° 66 io” a8 
5 tCSSC(‘‘NSCET'Cti‘“‘“CéCOW 


Rearrange the terms so that all the real terms are together and all the imaginary 
terms are together, 


8 gg an 
bo EAs S. Bhe ok cet an ae Se) 


You may recognize the two parenthetical expressions as the Taylor series for cos @ and 
sin @, and we pause to summarize this result of profound and universal importance. 


39 


Euler’s Formula 


ec? = cosé + ising. 


Notice the implication, 

an = cos’?d + sin? A = 1, 
which leads to one of the most necessary and widely used facts in all of physics and 
engineering, 


|e""| = 1, for real é. 


(Exercise. Prove this last equality without recourse to Euler’s formula, using 
exponential identities alone.| 

Euler’s formula tells us how to visualize the exponential of a pure imaginary. If we 
think of 6 as time, then e”’ is a “spec” (if you graph it in the complex plane) traveling 
around the unit-circle counter-clockwise at 1 radian-per-second (see Figure 1.6). I 


Figure 1.6: e’” after 6 seconds, @ 1 rad/sec 


cannot overstate the importance of Euler’s formula and the accompanying picture. 
You'll need it in this course and almost in any other engineering study. 


40 


1.4.2 Real sin() and cos() in Terms of the Complex Expo- 
nential 


Now try this: Plug —@ (minus theta) into Euler’s formula, then add (or subtract) the 
resulting equation to the original formula. Because of the trigonometric identities 
sin(—#) = -—sin(0), and 
cos(—@) =  cos(@), 


the so-called oddness and evenness of sin and cos, respectively, you would quickly 
discover the first (or second) equality 


ef + e 
cos = ar i 
ec? _ e 
sind = : 
24 


These appear often in physics and engineering, and we'll be relying on them later in 
the course. 


1.4.3. Complex Exponential Part 2: Any Complex Number 


We have defined exp() for pure imaginaries as an infinite sum, and we already knew 
that exp() for reals was an infinite sum, so we combine the two to define exp() for 
any complex number. If z = x + ty, 


z at+iy — 


exp(2)) = 6" = -e ev el, 


We can do this because each factor on the far right is a complex number (the first, 


of course, happens to also be real), so we can take their product. 


For completeness, we should note (this requires proof, not supplied) that every- 
thing we have done leads to the promised Taylor expansion for e* as a function of a 
complex z, namely, 
wit 
exp(z) = e = SS 


=F 
aT nN: 


This definition implies, among other things, the correct behavior of exp(z) with regard 
to addition of “exponents,” that is 


exp(z+w) = exp(z)exp(w), 


or, more familiarly, 


Al 


Easy Proof of Addition Law of Sines and Cosines 


A consequence of all this hard work is an easy way to remember the trigonometric 
addition laws. I hope you’re sitting down. Take A, B two real numbers — they can 
be angles. From Euler’s formula, using 6 = A+ B, we get: 

e4+B) = cos(A + B)+isin(A+ B) 
On the other hand, we can apply Euler’s formula to 6 = A and 0 = B, individually, 
in this product: 


® 
ay 
| 


= (cosA+isin A) (cos B+ isin B) 
(cosA cosB — sinA sin B) 
+ i(sinAcosB + cosA sin B) 


Because of the law of exponents we know that the LHS of the last two equations are 
equal, so their RHSs must also be equal. Finally, equate the real and imaginary parts: 


cos(A+B) = cosAcosB — sinAsinB 
sn(A+B) = sinAcosB + cosA sinB 


QED 


1.4.4 The Complex Trigonometric Functions 


We won’t really need these, but let’s record them for posterity. For a complex z, we 
have, 


and 


i — = | =. | 
as 1 31° 5B | 


We also get an Euler-like formula for general complex z, namely, 


e*” = cosz + itsinz. 


1.4.5 Polar Relations Expressed Using the Exponential 


We are comfortable expressing a complex number as the sum of its real and imaginary 
parts, 


2. = Bose ay 


42 


But from the equivalence of the Cartesian and polar coordinates of a complex number 
seen in figure 1.3 and expressed by 


xtiy «<> r(cosé + ising) , 


we can use the Euler formula on the RHS to obtain the very useful 


The latter version gives us a variety of important identities. If z and w are two 
complex numbers expressed in polar form, 


z = re’ and 
w = se?, 
then we have 
zw = rs elt) and 
z/w = Y 0-9) , 
Ss 


Notice how the moduli multiply or divide and the args add or subtract (figure 1.7 ). 


Figure 1.7: multiplication — moduli multiply and args add 


Also, we can express reciprocals and conjugates nicely, using 


1... 

le = ce and 
r 

2 = re® 


In fact, that last equation is so useful, Ill restate it slightly. For any real number, 0, 
(re’®)” = re? 


(Exercise. Using polar notation, find a short proof that the conjugate of a product 
(quotient) is the product (quotient) of the conjugates. (This is an exercise you may 
have done above using more ink.)| 


43 


A common way to use these relationships is through equivalent identities that put 
the emphasis on the modulus and arg, separately, 


jzw] = [2] lvl, 
E _ 
wl wl? 
IZ] = lal, 

arg(zw) = argz + argw, 
% 

arg (=) = argz — argw, and 
w 

arg(z*) = —argz. 


[Exercise. Verify all of the last dozen or so polar identities using the results of the 
earlier sections. | 


1.5 Roots of Unity 


An application of complex arithmetic that will be used extensively in Shor’s algorithm 
and the quantum Fourier transform involves the roots of unity. 


1.5.1 N Distinct Solutions to z% = 1 


Here is something you don’t see much in the real numbers. The equation 


ZN = 1, 


with a positive integer, N, has either one or two real roots (“zeros”), depending on 
the “parity” of N. That is, if N is odd, the only zero is 1. If N is even, there are two 
zeros, -1 and 1. 


In complex algebra, things are different. We’re going to see that there are N 
distinct solutions to this equation. But first, we take a step back. 
Consider the complex number (in polar form), 


te ee CON. 


wy is usually written 


e2™/N 


but I grouped the non-7 factor in the exponent so you could clearly see that wy was 
of the form e’” that we just finished studying. Indeed, knowing that 9 = 27/N is a 


real number allows us to use Euler’s formula, 
e® = cosO 4+ ising, 


to immediately conclude the following: 


44 


e wy lies somewhere on the unit circle, 


ew, = ee = 1, 

@w. = e@ = -1, 

ews, = M/s — 14 v3; 
ew, = e/?2 = i, and 


e for N > 4, as N increases (5, 6, 7, etc.), wy marches clockwise along the upper 
half of the unit circle approaching 1 (but never reaching it). For example w1999 
is almost indistinguishable from 1, just above it. 


For any N > 0, we can see that 


iN (27/N) Qri 


(wy)" = e = ¢ 


= cos27 + isin2xr = 1, 


earning it the name Nth root of unity (sometimes hyphenated Nth root-of-unity) as 
wells as the symbolic formulation 


WN = V1. 


In fact, we should call this the primitive Nth root-of-unity to distinguish it from its 
siblings which we’ll meet in a moment. Finally, we can see that wy is a non-real 
(when N > 2) solution to the equation 


Often, when we are using the same N and wy for many pages, we omit the subscript 
and use the simpler, 


WY = aw. 


with N understood by context, especially helpful when used in large formulas. 


If we fix a primitive Nth root-of-unity for some N (think N = 5 or larger), and we 
take powers of wy, like wy, (wy)?, (wy), etc., each power is just the rotation on the 
unit circle of the previous power, counter-clockwise, by the angle 27/N. [Exercise. 
Explain why.| Eventually we’ll wrap around back to 1, producing N distinct complex 
numbers, 

YW] = ci2n/N)— Gi22n/N)— gi8(2n/N)_ 


? 


., e@T= 1, 


(See Figure 1.8.) These are also Nth roots-of-unity, generated by taking powers of 
the primitive Nth root. 


(Exercise. Why are they called Nth roots-of-unity? Hint: Raise any one of them 
to the Nth power. ] 


45 


(2/5 
a 


= / 
2i(2"!/5) 7 \ 


e 


oo 5) 


Figure 1.8: the fifth roots-of-unity 


For our purposes, we'll consider the primitive Nth root-of-unity to be the one 
that has the smallest arg (angle), namely, e?"/", and we’ll call all the other powers, 
ordinary, non-primitive roots. However, this is not how mathematicians would make 
the distinction, and does not match the actual definition which includes some of the 
other roots, a subtlety we can safely overlook. 


As you can see, all N of the Nth roots are the vertices of a regular polygon 
inscribed in the unit circle, with one vertex anchored at the real number 1. This is a 
fact about roots-of-unity that you’ll want to remember. And now we have met all N 
of the distinct solutions to the equation zY = 1. 


1.5.2 Euler’s Identity 


By looking at the fourth root-of-unity, you can get some interesting relationships. 


eit /2 = 3 
2770 = 
e€ = 1. and 
e* = —-1. 


(See Figure 1.9.) That last equation is also known as Fuler’s identity (distinct from 
Euler’s formula). Sometimes it is written in the form 


a kb | = i: 


46 


ei(27/4) 


va 


ss es. 
efi(27/4) 


Figure 1.9: Three of the fourth roots-of-unity (find the fourth) 


Take a moment to ponder this amazing relationship between the constants, 7, e and 
TT. 


1.5.3. Summation Notation 


There’s a common notation used to write out long sums called the summation nota- 
tion. We’ll use it throughout the course, beginning with the next subsection, so let’s 
formally introduce it here. Instead of using the ellipsis (...) to write a long sum, as 
in 

a, + Ag + A3 + ... + An, 


we symbolize like this 


n 


de 


k=1 


The index starting the sum is indicated below the large Greek Sigma, (1), and the 
final index in the sum is placed above it. For example, if we wanted to start the sum 
from ag and end at a,_;, we would write 


Qk .- 


A common variation is the infinite sum, written as 


co 


Y ak, , 


k=0 


AT 


where the start of the sum can be anything we want it to be: 0, 1, —5 or even —oo. 


[Exercise. Write out the sums 


and 


using summation notation. | 


1.5.4 Summing Roots-of-Unity and the Kronecker Delta 


We finish this tutorial on complex arithmetic by presenting some facts about roots-of- 
unity that will come in handy in a few weeks. You can take some of these as exercises 
to confirm that you have mastered the ability to calculate with complex numbers. 


For the remainder of this section, let’s use the shorthand that I advertised earlier 
and call the primitive Nth root-of-unity w, omitting the subscript N, which will be 
implied. 

We have seen that w, as well as all of its integral powers, 

W, we ees wo. w% =w? =1, 
are solutions of the Nth degree polynomial equation 
zN-—1 = 0. 
But these N complex numbers are also solutions of another polynomial 


(2-1) (ew) (z-w?) --- (z-—w") => 210, 


To see this, just plug any of the Nth roots into this equation and see what happens. 
Any two polynomials that have exactly the same roots are equal, so 


zN—1 = (2-1) (z-w) (z-w) «+ (eg -w"). 


By high school algebra, you can easily verify that the LHS can also be factored to 


zN— 1 = (2-1) (eM 12 4 AP 4 ot Pt +1). 
We can therefore equate the RHS of the last two equations (and divide out the 
common (z — 1)) to yield 
eT + te tl = (g-w) + (z-w%). 
This equation has a number of easily confirmed consequences that we will use going 


forward. We list each one as an exercise. 


48 


Exercises 


(a) Show that when0 <1 < N, 


GOT: fe PRON) ee on Se gg h ae ok cl 1=0 


Hint: Plug w! in for z in the last equation. 


(b) Show that when -N <1 << N, 


py SN is eNO R) Stet eg 2 UL ee ean ae 
0, -N<I<N, 140 


Hint: Prove that, for all 1, w/t’ =w!, and apply the last result. 


(c) Show that for any integer J, 


wD) 4 MV) 4 4 yay = N, l=0 (mod N) 
0, £40 (mod N) 


Hint: Add (or subtract) an integral multiple of NV to (or from) / to bring it into 
the interval [—N, N) and call I’ the new value of I. Argue that w!(N-2) = w'(N-?), 
so this doesn’t change the value of the sum. Finally, apply the last result. 


(d) Show that forO < 7 < NandO <m< QJ, 


No Sn 


Gap POI Nh PME) fo cite gg IA): cS a 
0, 7#m 


Hint: Set / = 7 — m and apply the last result. 


Kronecker Delta 


All of these exercises have a common theme: When we add all of the Nth roots- 
of-unity raised to certain powers, there is a massive cancellation causing the sum 
to be zero except for special cases. I’ll repeat the last result using Kronecker delta 
symbol, as this will be very important in some upcoming algorithms. First, what 2s 
the “Kronecker delta?” 


Kronecker Delta. 6;,;, the Kronecker delta, is the mathematical way 
to express anything that is to be 0 unless the index k = j, in which case 


it as 1, 
1, ifk=j 
Onj = 
0, otherwise 


49 


You will see Kronecker delta throughout the course (and beyond), starting immedi- 
ately, as I rewrite result (d) using it. 


N-1 
k=0 


We Are In the Air 


Our flight is officially off the ground. We’ve learned or reviewed everything needed 
about complex arithmetic to sustain ourselves through the unusual terrain of quantum 
computing. There’s a second subject that we will want to master, and that’ll be our 
first fly-over. Look out the window, as it should be coming into view now: linear 
algebra. 


50 


Chapter 2 


Real Vector Spaces 


2.1 Vector Spaces for Quantum Computing 


In each of these lessons we will inch a little closer to a complete understanding of 
the qubit. At the moment, all we know about a qubit is that it can be written as a 
collection of abstract symbols, 


ly) = a0) + B[1), 


where last lecture revealed two of the symbols, a and {, to be complex numbers. 
Today, we add a little more information about it. 


The qubit, |w), is a vector quantity. 


Every vector lives in a world of other vectors, all of which bear certain similarities. 
That world is called a vector space, and two of the similarities that all vectors in any 
given vector space share are 


e its dimension (i.e., is it two dimensional? three dimensional? 10 dimensional’), 
and 


e the kinds of ordinary numbers — or scalars — that support the vector space 
operations (i.e., does it use real numbers? complex numbers? the tiny set of 
integers {0, 1}?). 


In this lecture we'll restrict our study to real vector spaces — those whose scalars 
are the real numbers. We’ll get to complex vector spaces in a couple days. As for 
dimension, we'll start by focusing on two-dimensional vector spaces, and follow that 
by meeting some higher dimensional spaces. 


51 


2.2 Vectors and Vector Spaces 


R?, sometimes referred to as Euclidean 2-space, will be our poster child for vector 
spaces, and we’ll see that everything we learn about R? applies equally well to higher 
dimensional vector spaces like R? (three-dimensional), R* (four-dimensional) or R” 
(n-dimensional, for any positive integer, 7). 


With that overview, let’s get back down to earth and define the two-dimensional 
real vector space R?. 


2.2.1 Standard Equipment: The Axioms 


Every vector space has some rules, called the axioms, that define its objects and the 
way in which those objects can be combined. When you learned about the integers, 
you were introduced to the objects, (... — 3, —2, —1, 0, 1, 2, 3, ...) and the rules 
(2+2=4, (—3)(2) = —6, etc.). We now define the axioms — the objects and rules — 
for the special vector space R?. 


The Objects 


A vector space requires two sets of things to make sense: the scalars and the vectors. 


Scalars 


A vector space is based on some number system. For example, R? is built on the 
real numbers, R. These are the scalars of the vector space. In math lingo the scalars 
are referred to as the underlying field. 


Vectors 


The other set of objects that constitute a vector space are the vectors. In the case 
of R? they are ordered pairs, 


r=(4) (MR) (2) (8). 


You'll note that I use boldface to name the vectors. That’s to help distinguish a 
vector variable name like r, x or v, from a scalar name, like a, x or a. Also, we will 


not rows like (3, —7), 


usually consider vectors to be written as columns like (“,). 


although this varies by author and context. 


A more formal and complete description of the vectors in R? is provided using set 


notation, 


(See figure 2.1.) This is somewhat incomplete, though, because it only tells what the 


8 

< 

a 
A 


52 


Figure 2.1: A vector in R? 


set of vectors is and fails to mention the scalars or rules. 


Notation: Vector Transpose. When writing vectors in a paragraph, I may 
write (3, —7) when I’m being lazy, or (3, —7)', when I’m being good. The latter is 
pronounced “(3, —7) transpose”, and it means “turn this row vector into a column 
vector.” It goes both ways. If I had a column vector and wanted to turn it into a 
row-vector, I would tag a superscript ()’ onto the column vector. 


Vocabulary. When we want to emphasize which set of scalars we are using for a 
given vector space, we would say that the vectors space is defined over [insert name 


of scalars here/. For example, the vector space 
over R). 


The Rules 


R? is a vector space over the reals (or 


Two operations and their properties are required for the above objects. 


Vector Addition 


We must be able to combine two vectors to produce a third vector, 


v+w eo 


u. 


In R? this is defined by adding vectors component-wise, 
Ly v2 ( U+%X2 ) 
+ = , 
Yi Y2 Yi + Y2 


Vector addition must obey the required rules. They are: 


(See figure 2.2.) 


e Zero Vector. Every vector space must have a unique zero vector, denoted by 
boldface 0, which has the property that v+0 = O+v = v, forall vin 
the space. For R? the zero vector is (0, 0)*. 


53 


Figure 2.2: Vector addition in R? 


e Vector Opposites. For each vector, v, there must be an additive inverse, 


i.e., a unique —v, such that v+(—v) = 0. In R? the additive inverse of (x, y)* 


is (=2, —y)’. 


e Commutativity and Associativity. The vectors must satisfy the reasonable 
conditions v+w =w+v and (v+w)+u = v+(w+u), for all v, w 
and u. 


Scalar Multiplication 


We can “scale” a vector by a scalar, that is, we can multiply a vector by a scalar 
to produce another vector, 


CV oO WwW. 


In R? this is defined in the expected way, 


a) ae Sek 
(See figure 2.3.) 


Scalar multiplication must obey the required rules. They are: 


e Scalar Identity. lv = v, for any vector v. 
e Associativity. (ab)v = a(bv), for any vector v and scalars a, b. 


e Distributivity. c(v+w) = cv+cw and (c+d)(v) = cv + dv, for any 
vectors v,w and scalars c, d. 


[Exercise. Assuming 


=(3)- = (84). (6). 


(a) verify the associativity of vector addition using these three vectors, 
(b) multiply each by the scalar 1/7, and 
(c) verify the distributivity using the first two vectors and the scalar 1/7.| 


54 


Figure 2.3: Scalar multiplication in R? 


2.2.2 Optional Equipment: Inner Products, Modulus and 
Orthogonality 


The vector spaces we encounter will have a metric — a way to measure distances. The 
metric is a side effect of a dot product, and we define this optional feature now. 


(Caution. The phrase “optional equipment” means that not every vector space 
has an inner product, not that this is optional reading. It is crucial that you master 
this material for quantum mechanics and quantum computing since all of our vector 
spaces will have inner products and we will use them constantly.] 


Dot (or Inner) Product 


When a vector space has this feature, it provides a way to multiply two vectors in 
order to produce a scalar, 


vV-w t+— CC. 


In R? this is called the dot product, but in other contexts it may be referred to as an 
inner product. There can be a difference between a dot product and an inner product 
(in complex vector spaces, for example), so don’t assume the terms are synonymous. 
However, for R? they are, with both defined by 


XY HD) 
: = £1%Q2 + : 
( V1 ) ( Yo ) 102 Y1Y2 


Inner products can be defined differently in different vector spaces,. However they 
are defined, they must obey certain properties to get the title inner or dot product. 
I won’t burden you with them all, but one that is very common is a distributive 


property. 


v-(w, + wo) = vV-ew, + V-wo. 


59 


(Exercise. Look-up and list another property that an inner product must obey.| 


[Exercise. Prove that the dot product in R?, as defined above, obeys the dis- 
tributive property.| 


When we get to Hilbert spaces, there will be more to say about this. 


Length (or Modulus or Norm) 


An inner product, when present in a vector space, confers each vector with a length 
(a.k.a. modulus or norm), denoted by either |v| or |/v||. The length (in most situ- 
ations, including ours) of each vector must be a non-negative real number, even for 
complex vector spaces coming later. So, 


For our friend R?, if 


then the length of v is defined to be 


ms vw = oG)-(G) = vere 


positive square root, assumed. 


Orthogonality 


Figure 2.4: Orthogonal vectors 


If two vectors, v and w, have a dot product which is zero, 
v:-w = 0, 


we say that they are orthogonal or mutually perpendicular. In the relatively visu- 
alizable spaces R? and R®, we can imagine the line segments Ov and Ow forming 
right-angles with one another. (See figure 2.4.) 


56 


Examples of Orthogonal Vectors. A well known orthogonal pair is 


to) Gf. 
1G) GE 


[Exercise. Use the definition of dot product to show that each of the two sets of 
vectors, listed above, is orthogonal. 


Another is 


Examples of Non-Orthogonal Vectors. An example of a set that is not 


erry 


More Vocabulary. If a set of vectors is both orthogonal and each vector in the 
set has unit length (norm = 1), it is called orthonormal. 


- (0). (9) 
(1). (8) 


is orthogonal (check it) but not orthonormal. 


is orthonormal, while the set 


Some Unexpected Facts 


e A length seems like it should always be non-negative, but this won’t be true in 
some common vector spaces of first-year physics (hint: special relativity). 


e If v 40, we would expect v-v > 0, strictly greater than zero. However, not so 
when we deal with the mod-2 vector spaces of quantum computing. 


These two unexpected situations suggest that we need a term to distinguish the 
typical, well-behaved inner products from those that are off-beat or oddball. That 
term is positive definiteness. 


Positive Definite Property. An inner-product is usually required to be positive 
definite, meaning 


ev-v > O, and 


57 


ev40 = |yv||>0. 


When a proposed inner product fails to meet these conditions, it is often not granted 
the status inner (or dot) product but is instead called a pairing. When we come 
across a pairing, I’ll call it to your attention, and we can take appropriate action. 


[Exercise. Assume 


(3 _ (500 eo 
P= | op | 4 a= i cae and X=| 4 | - 


a) Compute the dot product of every pair in the group. 


( 
( 


(c) Prove that the only vector in R? which has zero norm is the zero vector, 0. 


) 

b) Compute the norm of every vector in the group. 
) 
) 


(d) For each vector, find an (i) orthogonal and (77) non-orthogonal companion.| 


2.2.3 A Bigger Vector Space: R°® 


Even though we have not learned what. dimension means, we have an intuition that R? 
is somehow two-dimensional: we graph its vectors on paper and it seems flat. Let’s 
take a leap now and see what it’s like to dip our toes into the higher dimensional 
vector space R* (over R). R® is the set of all triples (or 3-tuples) of real numbers, 


' 

' 

1 

1 

' 

1 

' 

' 

! 

1 

' 

ee 

, 
sy 
--<¥ 


Figure 2.5: A vector in R? 


A 


Fy = y Z,Y,Z2€ 
z 


(This only tells us what the vectors are, but I’m telling you now that the scalars 
continue to be R.) 


58 


Examples of vectors in R® are 


2 0 n /2 
-1], [1], {100 
3.5 0 ~9 


It’s harder to graph these objects, but it can be done using three-D sketches (See 
figure 2.5.). 


[Exercise. Repeat some of the examples and exercises we did for R? for this 
richer vector space. In particular, define vector addition, scalar multiplication, dot 
products, etc.] 


2.2.4 Preview of a Complex Vector Space: C? 


We'll roll out complex vector spaces in their full glory when we come to the Hilbert 
space lesson, but it won’t hurt to put our cards on the table now. The most useful 
vector spaces in this course will be ones in which the scalars are the complex numbers. 
The simplest example is C?. 


Definition. C? is the set of all ordered pairs of complex numbers, 


c= (3) nvech. 


You can verify that this is a vector space and guess its dimension and how the inner 
product is defined. Then check your guesses online or look ahead in these lectures. 
All I want to do here is introduce C? so you'll be ready to see vectors that have 
complex components. 


2.3 Bases for a Vector Space 


If someone were to ask,“ What is the single most widely used mathematical concept is 
in quantum mechanics and quantum computing,” I would answer, “the vector basis.” 
(At least that’s my answer today.) It is used or implied at every turn, so much so, 
that one forgets it’s there. But it is there, always. Without a solid understanding of 
what a basis is, how it affects our window into a problem, and how different bases 
relate to one another, one cannot participate in the conversation. It’s time to check 
off the “what is a basis” box. 


Incidentally, we are transitioning now to properties that are not axioms. They are 
consequences of the axioms, i.e., we can prove them. 


2.3.1 Linear Combination (or Superposition) 


Whenever you have a finite set of two or more vectors, you can combine them using 
both scalar multiplication and vector addition. For example, with two vectors, v, w, 


59 


and two scalars, a, b, we can form the vector u by taking 
u = av + bw. 


Mathematicians call it a linear combination of v and w. Physicists call it a superpo- 
sition of the two vectors. Superposition or linear combination, the idea is the same. 
We are weighting” each of the vectors, v and w, by scalar weights, a and b, respec- 
tively, then adding the results. In a sense, the two scalars tell the relative amounts 
of each vector that we want in our result. (However, if you lean too heavily on that 
metaphor, you will find yourself doing some fast talking when your students ask you 
about negative numbers and complex scalars, so don’t take it too far.) 


The concept extends to sets containing more than two vectors. Say we have a 
finite set of n vectors {v,} and corresponding scalars {c,}. Now a linear combination 
would be expressed either long-hand or using summation notation, 


U = j(CoVg9 + CyVy + ..- + Cyn_-1Vn—-1 
n-1 

= So cee. 
k=0 


Linear combinations are fundamental to understanding vector bases. 


[Exercise. Form three different linear combinations of the vectors 


(5). (88). 2-(2), 


showing the original scalars you chose for each one. Reduce each linear combination 
into the simplest form you can. | 


[Exercise. Play around with those three to see if you can express any one of 
them as a linear combination of the other two. Hint: You can, no matter how you 
split them up.| 


2.3.2 The Notion of Basis 


One can find a subset of the vectors (usually a very tiny fraction of them compared 
to the infinity of vectors in the space) which can be used to ” generate” all the other 
vectors through linear combinations. When we have such a subset that is, in a sense, 
minimal (to be clarified), we call it a basis for the space. 


The Natural Basis 


In R?’, only two vectors are needed to produce — through linear combination — all the 
rest. The most famous basis for this space is the standard (or natural or preferred) 
basis, which Ill call A for now, 


+= 09 = {QQ} 


60 


For example, the vector (15,3)' can be expressed as the linear combination 


(8) <a + 99. 


Note. In the diagram that follows, the vector pictured is not intended to be (15, 3)’. 
(See figure 2.6.) 


xy 
¥ 


Ne 


Figure 2.6: A vector expressed as linear combination of ¥ and ¥ 


Notation. It is common to use the small-hat, *, on top of a vector, as in x, 
to denote a vector that has unit length. For the time being, we’ll reserve the two 
symbols, & and ¥ for the vectors in the natural basis of R?. 


Properties of a Basis 


Hang on, though, because we haven’t actually defined a basis yet; we only saw an 
example of one. A basis has to have two properties called (i) linear independence and 
(72) completeness (or spanning). 


(i) Linear Independence. A set of vectors is linearly independent if we cannot 
express any one of them as a linear combination of the others. If we can express 
one as a linear combination of the others, then it is called a linear dependent 
set. A basis must be linearly independent. 


We ask whether the following three vectors making up the set A’ is a basis. If 
so, it must be linearly independent. Is it? 


4 = {() G) Gi 


61 


Since we can express (3,2)! as 


() = ok + 9, 


A’ is not a linearly independent set of vectors. 


Restating, to exercise our new vocabulary, (3,2)! is weighted sum of & and ¥, 
where 3 is the weight of the x , and 2 is the weight of the y . Thus, it is a 
linear combination of those two and therefore linearly dependent on them. The 
requirement that every basis be linear independent means that there cannot be 
even one vector in the basis which can be expressed as a linear combination of 
the others, so A’ cannot be a basis for R?. 


[Exercise. Is it also true that (1,0)' can be expressed as a linear combination 
of the other two, (0,1)' and (3,2)'? Hint: Yes. See if you can fiddle with the 
equation to find out what scalars are needed to get (1, 0)‘ as a linear combination 
of the other two.| 


[Exercise. Show that the zero vector (0,0)' is never part of a linearly inde- 
pendent set. Hint: Find the coefficients of the remaining vectors that easily 
produce (0,0)* as a linear combination.| 


(ii) Completeness (or Spanning). Completeness of a set of vectors means that 
the set spans the entire vector space, which in turn means that any vector in 
the space can be expressed as a linear combination of vectors in that set. A 
basis must span the space. 


Does the following set of two vectors, A”, span R*? 


1 0 
A = Cs ren (ae 
0 0 


Since we cannot express (3, —1,2)' as a linear combination of these two, i-e., 


3 1 0 
—1, ~ x|(0O}] + yf{l 
2 0 0 


for any x,y € R, A” does not span the set of vectors in R°. In other words, A” 
is not complete. 


[Exercise. Find two vectors, such that if either one (individually) were added to 
A”, the augmented set would span the space.| 


[Exercise. Find two vectors, such that, if either one were added to A”, the 
augmented set would still fail to be a spanning set.| 


62 


Definition of Basis 


We now know enough to formally define a vector space basis. 


Basis. A basis is a set of vectors that is linearly independent and 
complete (spans the space). 


Another way to phrase it is that a basis is a minimal spanning set, meaning we 
can’t remove any vectors from it without losing the spanning property. 
Theorem. All bases for a given vector space have the same number of elements. 


This is easy to prove (you can do it as an [exercise] if you wish). One consequence 
is that all bases for R? must have two vectors, since we know that the natural basis 
has two elements. Similarly, all bases for R? must have three elements. 


Definition. The dimension of a vector space is the number of elements in any 
basis. 


[Exercise. Describe some vector space (over R) that is 10-dimensional. Hint: 
The set of five-tuples, { (x0, ©1, %2, 73, V4)’ | x, € R } forms a five-dimensional vector 
space over R.] 


Here is an often used fact that we should prove. 


Theorem. Jf a set of vectors {vz}, is orthonormal, it is necessarily linearly 
independent. 


Proof. We'll assume the theorem is false and arrive at a contradiction. So, we 
pretend that {v,} is an orthonormal collection, yet one of them, say vo, is a linear 
combination of the others, 


n-1 
vo = y CkVk ; 
k=1 


where not all the c, can be 0, since that would imply that vo = 0, which cannot 
be a member of any orthonormal set (remember from earlier?). By orthonormality 
Vo: Vy, = 0 for all k 4 0, but of course vg - Vo = 1, so we get the following chain of 
equalities: 


n-1 n-1 
LL So ygevy = Wee SRV = ) © cx (Vo * Ve) = Ge 
= k=1 


a contradiction. QED 


Notice that even if the vectors were orthogonal, weaker than their being orthonor- 
mal, they would still have to be linearly independent. 


63 


Alternate Bases 


There are many different pairs of vectors in R? which can be used as a basis. Every 
basis has exactly two vectors (by our theorem). Here is an alternate basis for R?: 


B= fbb = {(7)(4)}- 


For example, the vector (15,3)' can be expressed as the linear combination 


@ — (27/5)by + (12/5)br. 


[Exercise. Multiply this out to verify that the coefficients, 27/5 and 12/5 work for 
that vector and basis.| 


And here is yet a third basis for R?: 
= _ J (v2/2\ (-v2/2 
Cc = {Co, ci} — { (“ap ’ V2/2 ’ 
with the vector (15,3)! expressible as the linear combination 
ye (9v2) c + (-6v2) ¢ 
3 0 Ds 


[Exercise. Multiply this out to verify that the coefficients, 9/2 and —6\V/2 work for 
that vector and basis.| 


Note. In the diagrams, the vector pictured is not intended to be (15,3)*. (See 
figures 2.7 and 2.8.) 


Figure 2.7: A vector expressed as linear combination of bp and b, 


64 


Figure 2.8: A vector expressed as linear combination of cg and c; 


Expanding Along a Basis. When we want to call attention to the specific basis 
we are using to express (i.e., to produce using the wrench of linear combination) a 
vector, v, we would say we are expanding v along the so-and-so basis, e.g., ” we are 
expanding v along the natural basis,” or ” let’s expand v along the B basis.” 


Orthonormal Bases 


We are particularly interested in bases whose vectors have unit length and are mutu- 
ally perpendicular. Not surprisingly, because of the definition of orthonormal vectors, 
above, such bases are called orthonormal bases. 


Ais orthonormal: x*-%=y-y=1, x*-y=0 
B is not orthonormal: bg- bg = 21, bp -b; = 340 
C is orthonormal: ({Exercise]) 
If they are mutually perpendicular but do not have unit length, they are almost as 
useful. Such a basis is called an orthogonal basis. If you get a basis to be orthogonal, 


your hard work is done; you simply divide each basis vector by its norm in order to 
make it orthonormal. 


2.3.3 Coordinates of Vectors 


As a set of vectors, R? consists literally of ordered pairs of real numbers, { (x,y) } or 
if you like, { (a, y)’ }. Those are the vector objects of the vector space. However, each 
vector also has coordinates relative to some basis, namely the specific scalars needed 
to express that vector as a linear combination of the basis vectors. For example, if v 
is expanded along the natural basis, 


Vio= Uk +4,Y, 


65 


its coordinates relative to the natural basis are v; and vy. In other words, the coor- 
dinates are just weighting factors needed to expand that vector in that given basis. 


Vocabulary. Sometimes the term coefficient is used instead of coordinate. 


If we have a different basis, like B = {bo, b,}, and we expand v along that basis, 
vo = wobo +4, bi, 


then its coordinates (coefficients) relative to the B basis are vp and vy. 


As you see, the same vector, v, will have different coordinates relative to different 
bases. However, the ordered pair that describes the pure object — the vector, (x, y)', 
itself — does not change. 


For some vector spaces, R? being a prime example, it is easy to confuse the 
coefficients with the actual vector. Relative to the natural (standard) basis, the 
coordinates happen to be the same as the numbers in the vector itself, so the vector 
and the coordinates, when written in parentheses, are indistinguishable. If we need 
to clarify that we are expressing coordinates of a vector, and not the raw vector itself, 
we can label the ordered pair appropriately. Such a label would be the name of the 
basis being used. It’s easier to do than it is to say. Here is the vector (15, 3)’, first 
expanded along the natural basis, 


ey 18s ae: 


and now shown along with its coordinates in the standard basis, 


ORG 


For the non-preferred bases of the previous section, the coordinates for (15, 3), simi- 
larly labeled, are written as 


(3) = Gis), = (Co), 


Computing Expansion Coefficients along a Basis 


In physics, it is common to use the term expansion coefficients to mean the coordinates 
relative to some basis. This terminology seems to be used especially if the basis 
happens to be orthonormal, although you may see the term used even if it isn’t. 


() = (9v2) e + (-6V2) ex. 


so 9\/2 and —6y/2 are the “expansion coefficients” of 
lS 
3 


66 


along the C basis. 


If our basis happens to be orthonormal, there is a special trick we can use to find 
the coordinates — or expansion coefficients — relative to that orthonormal basis. 


Let’s say we would like to expand v along the C basis, but we only know it in the 
natural basis, A. We also know the C basis vectors, cg and c;, in the natural basis 
A. In other words, we would like, but don’t yet have, 


vVie= aocgo + Q1C1, 


i.e., we don’t know the scalar weights ag, a;. An easy way to find the a, is to ” dot” 
v with the two C basis vectors. We demonstrate why this works by computing ay: 


Co V = Co + (A0C) + a1Cy) 


Co - ACo + Co ACY 
ao (Co - Co) + ay (€o - C1) 
= Qo (1) + Q1 (0) = Qo 


[Exercise. Justify each equality in this last derivation using the axioms of vector 
space and assumption of orthonormality of C.] 


[Exercise. This trick works almost as well with an orthogonal basis which does 
not happen to be orthonormal. We just have to add a step; when computing the 
expansion coefficient for the basis vector, c,, we must divide the dot product by |c,|’. 
Prove this and give an example.| 


Thus, dotting by co produced the expansion coefficient ap. Likewise, to find a; 
just dot the v with cy. 


For the specific vector v = (15,3)! and the basis C, let’s verify that this actually 
works for, say, the Oth expansion coefficient: 


a Care = 15/2/2 + 32/2 = 9v2 Vv 


The reason we could add the check-mark is that this agrees with the expression we 
had earlier for the vector v expanded along the C basis. 


Remark. We actually did not have to know things in terms of the natural basis 
A in order for this to work. If we had known the coordinates of v in some other basis 
(it doesn’t even have to be orthonormal), say 6, and we also knew coordinates of the 
C basis vectors with respect to 6, then we could have done the same thing. 


[Exercise. If you’re up for it, prove this.| 


2.3.4 Independence of Basis (or Not?) 


The definition of inner product assumed that the n-tuples were, themselves, the 
vectors and not some coordinate representation expanded along a specific basis. Now 


67 


that we know a vector can have different coordinates relative to different bases, we 
ask “is the inner product formula that I gave independent of basis?,” i.e., can we 
use the coordinates, (d,,d,) and (e,,e,) relative to some basis — rather than the 
numbers in the raw vector ordered pair — to compute the inner product using the 
simple dye, + dye,? In general the answer is no. 

[Exercise. Compute the length of the vector (15, 3)’ by dotting it with itself. Now 
do the same thing, but this time compute using that vector’s coordinates relative to 
the three bases A, B and C through use of the imputed formula given above. Do you 
get the same answers? Which bases’ coordinates give the right inner-product answer’ 

However, when working with orthonormal bases, the answer is yes, one can use 
coordinates relative to that basis, instead of the pure vector coordinates, and apply 
the simple formula to the coordinates. 

[Exercise. Explain the results of the last exercise in light of this new assertion.] 

[Exercise. See if you can prove the last assertion.] 

Note. There is a way to use non-orthonormal basis coordinates to compute dot 


products, but one must resort to a more elaborate matrix multiplication, the details 
of which we shall skip (but it’s a nice [Exercise] should you wish to attempt it). 


2.4 Subspaces 


The set of vectors that are scalar multiples of a single vector, such as 


a eeEr Spel eee 


is, itself, a vector space. It can be called a subspace of the larger space R?. As 
an exercise, you can confirm that any two vectors in this set, when added together, 
produce a third vector which is also in the set. Same with scalar multiplication. So 
the subspace is said to be closed under vector addition and scalar multiplication. In 
fact, that’s what we mean by a subspace. 


A 


A 


Vector Subspace. A subspace of a (parent) vector space is a subset of 
the parent vectors that is closed under the vector/scalar operations. 


2.5 Higher Dimensional Vector Spaces 


What we just covered lays the groundwork for more exotic — yet commonly used — 
vector spaces in physics and engineering. That’s why I first listed the ideas and facts 
in the more familiar setting of R? (and R?). If you can abstract these ideas, rather 
than just memorize their use in R? and R?, it will serve you well, even in this short 
quantum computing course. 


68 


Axioms 


We can extend directly from ordered pairs or triples to ordered n-tuples for any 
positive integer n: 


Rv = ; , ct, ER, k=Oton-1 


Set n = 10 and you are thinking about a 10-dimensional space. 


The underlying field for R” is the same real number system, R, that we used 
for R?. The components are named 2, %, Yo, 23, ..., instead of x, y, Zz, ... 
(although, when you deal with relativity or most engineering problems, you’ll use the 
four-dimensional sybols x, y, z and t, the last meaning time). 


[Exercise. Define vector addition and scalar multiplication for R" following the 
lead set by R? and R®.] 


Inner Product 


The dot product is defined as you would expect. If 


ao bo 
a b 
a : and? by = i ; 
An-1 Oye 24 
then 
a-b — ae anor 5 
k=1 

[Exercise. Compute the length of the vector (1, 1, 1, ..., 1)’ in R”] 
Basis 


The standard (or natural or preferred) basis in R” is 


1 0 0 
0 1 0 
A = 5) : 9: 9 
0 0 1 
= { Xo, xX, ; Xn} : 


It is clearly orthonormal (why “clearly?” ), and any other basis, 


B = { bo, bi, frase Digetip 


for R” will therefore have n vectors. The orthonormal property would be satisfied by 
the alternate basis B if and only if 


We defined the last symbol, 0,;, in our complex arithmetic lesson, but since many of 
our readers will be skipping one or more of the early chapters, I'll reprise the definition 
here. 


Kronecker Delta. 6;;, the Kronecker delta, is the mathematical way 
to express anything that is to be 0 unless the index k = j, in which case 


it as 1, 
l, ifk=j 
Oni = 
0, otherwise 


Expressing any vector, v, in terms of a basis, say B, looks like 


n 
v= ) ay Dr, 
k=1 


and all the remaining properties and definitions follow exactly as in the case of the 
smaller dimensions. 


Computing Expansion Coefficients 


I explained how to compute the expansion coefficient of an arbitrary vector along 
an orthonormal basis. This move is done so frequently in a variety of contexts in 
quantum mechanics and electromagnetism that it warrants being restated here in the 
more general cases. 


When the basis is orthonormal, we can find the expansion coefficients for a vector 


n 
V= ) az by , 
k=1 


by dotting v with the basis vectors one-at-a-time. In practical terms, this means “we 
dot with b,; to get a;,” 


b; 


ye 


bj: >> (axbe) = 5b; (axbe) 
k=1 k=1 

= So an (b; : b;) => So andj = aj. 
k=1 k=1 


70 


2.6 More Exercises 


Prove any of following that interest you: 


ie 


If you have a spanning set of vectors that is not linearly independent, you can 
find a subset that is a basis. (Argue why/how.) Give an example that demon- 
strates you cannot arbitrarily keep the “correct number” of vectors from the 
original spanning set and expect those to be your basis; you have to select your 
subset, carefully. 


. Demonstrate that the basis C, in the examples above, is orthonormal. 


. The Gram-Schmidt Process (Tricky). Starting with an arbitrary basis of a 


subspace, you can construct an orthonormal basis which spans the same set as 
your arbitrary basis. This is not the same as selecting a subset as in exercise 1, 
but involves creating new vectors systematically from the existing basis. Hint: 
You can keep any of the original vectors, as is, but from there, you might not 
be able to keep any of the other vectors in the final, orthonormal, set. Use what 
you know about dot products, perpendicular vectors and projection of vectors. 
Draw pictures. 


. Show that the set of vectors in R? which have length < 1 does not constitute a 


vector subspace R?. 


. Prove that the set of points on the line y = 3x — 1 does not constitute a vector 


subspace of R?. 


. Argue that all one-dimensional vector subspaces of R? consist of lines through 


the origin. 


. Argue that all two-dimensional subspaces of R® consist of planes through the 


origin. 


And that does it for our overview of vector spaces. The extension of these concepts 
to the more exotic sounding vector spaces head — complex vector spaces, Hilbert spaces 
and spaces over the finite field Z2 — will be very easy if you understand what’s in this 
section and have done several of the exercises. 


71 


Chapter 3 


Matrices 


3.1 Matrices in Quantum Computing 


If a qubit is the replacement for the classical bit, what replaces the classical logic 
gate? I gave you a sneak peek at the answer in the introductory lesson. There, I 
first showed you the truth table of the conventional logic gate known to all computer 
science freshmen as the AND gate (symbol /). 


x Yy xrAy 
0 0 0 
0 1 0 
1 0 0 
1 1 1 


Then, I mentioned that in the quantum world these truth tables get replaced by 
something more abstract, called matrices. Like the truth table, a matrix contains 
the rules of engagement when a qubit steps into its foyer. The matrix for a quantum 
operator that we’ll study later in the quarter is 


i oe! Se Oem | 
1/1 -1 1 -1 
DAP i abewesy) ei tl 2 
1 a ea 


which represents a special gate called the second order Hadamard operator. We’ll 
meet that officially in a few weeks. 


Our job today is to define matrices formally and learn the specific ways in which 
they can be manipulated and combined with the vectors we met the previous chapter. 


72 


3.2 Definitions 


Definition of a Matrix. A matriz is a rectangular array of numbers, 
variables or pretty much anything. It has rows and columns. Each 
matrix has a particular size expressed as [# rows] x [# columns], 
jor example,-2°x 2,/3 «4.7% * 1,10 * 10; 26c. 


A 3 x 4 matrix (call it A) could be: 


12 3 4 
DO: 0 17 38 
Oe 10) 112 
A 4x 2 matrix (call it B) might be: 
ce! Oi 
—3 -4 
—5 —6 
—7 -8 


3.2.1 Notation 
For this lesson I will number starting with 1, not 0, so 
erowl = the first row =_ the top row of the matrix, and 
e column 1 = _ the first column =_ the left column of the matrix. 


I usually name matrices with non-boldface capital letters, A, B, etc. Occasionally, 
I will use boldface, A, B. The context will always be clear. 


Sometimes you'll see the matrix represented abstractly as the (kl)th component 
surrounded by parentheses, 


( Axi) ) 


which represents the entire matrix A, whose value in row k and column / is Az;. This 
doesn’t tell you the size, but you’ll always know that from the context. 


I'll typically use m x n or p x q to describe the size of some general matrix and 
nxn or px p when I want to specify that we are dealing with a square matrices, i.e., 
one that has same number of rows as columns. 


3.3. Matrix Multiplication 


Matrix multiplication will turn out to be sensitive to order. In math lingo, it is not 
a commutative operation, 


AB # BA. 


73 


Therefore, it’s important that we note the order of the product in the definition. 
First, we can only multiply the matrices A (n x p) and B (q x m) if p= q. So we will 
only define the product AB for two matrices of sizes, A (n x p) and B (p x m). The 
size of the product will be n x m. Symbolically, 


(nxp) + (pxm) = (nxm). 
Note that the “inner” dimension gets annihilated, leaving the “outer” dimensions to 
determine the size of the product. 


3.3.1 Row x Column 


We start by defining the product of a row by a column, which is just a short way of 
saying a (1 x /) matrix times an (/ x 1) matrix, i-e., two “1-dimensional” matrices. It 
is the simple dot product of the two entitles as if they were vectors, 


(Le 84) = (1)(-5) + (2)(6) + (8)(-7) + (4)(8) 


= 18. 
This is the definition of matrix multiplication in the special case when the first is a 
column vector and the second is a row vector. But that definition is used repeatedly 


to generate the product of two general matrices, coming up next. Here’s an example 
when the vectors happen to have complex numbers in them. 


(lee eee | = @(-35) + @@6) + @)(-4i) + (2-3:1)(83) 


191073 


[For Advanced Readers. As you sce, this is just the sum of the simple products 
of the coordinates, even when the numbers are complex. For those of you already 
familiar with complex inner products (a topic we will cover next week), please note 
that this is not a complex inner product; we do not take the complex conjugate of either 
vector. Even if the matrices are complex numbers, we take the ordinary complex 
product of the corresponding elements and add them.| 


3.3.2 Definition of Matrix Multiplication 


The full definition of two matrices AB is more-or-less forced on us by everything we 
have said so far. Based on the required size of the product and the definition of two 


74 


1-dimensional matrices, we must define the (kl)th element of the answer matrix to 
be the dot product of the kth row of A with the /th column of B. Let’s look at it 
graphically before we see the formal definition. 


We illustrate the computation of the 1-1 element of an answer matrix in Figure 3.1 
and the computation of the 2-2 element of the same answer matrix in Figure 3.2. 


1lstrow - 15tcolumn 


e 
%, Cy 

—l 2 
tL 2 & 4 —50 
ee , —3 -—4 
» 6 ( S me _¢ — —140 
9 10 11 12 ty _3 [ J] | 

A B C=AB 


Figure 3.1: Dot-product of the first row and first column yields element 1-1 


2% row: 2°¢ column 


Figure 3.2: Dot-product of the second row and second column yields element 2-2 


The Formal Definition of Matrix Multiplication. [f A is ann x p matriz, 
and B is ap xm matrix, then C = AB is ann xX m matrix, whose (kl)th element is 


given by 


Pp 
Cu = (AB) = >) Abs Bu, (3.1) 
j=l 


where k= 1,...,n and/=1,...,m. 


75 


[Exercise. Fill in the rest of the elements in the product matrix, C,, above.] 


[Exercise. Compute the products 


1 2 1 12 8 

—2 0 4 3 11 

5 5 5 00 1 

and 

1 2 0-1 eae 
3 1 éid1 
—2 -1 1 4 001 
5 O 5 O Lt. 


[Exercise. Compute the first product above in the opposite order. Did you get 
the same answer’?| 


While matrix multiplication may not be commutative, it 1s associative, 
A(BC) = (AB)C, 


a property that is used frequently. 


3.3.3 Product of a Vector by a Matrix 


When a product happens to have an n x 1 matrix in the second position, it is usually 
called the product of a vector by a matrix because an n X 1 matrix looks just like an 
n- dimensional vector. So, if v is an n-dimensional vector and A is an n x n matrix, 
we already know what to do if we see Av. For example, 


1 2 3 4 —4 
3 1 1 2.5% = 11 + 2.52 
Os Loe al att —1 


Some Observations 


e Position. You cannot put the vector on the left of the matrix; the dimensions 
would no longer be compatible. You can, however, put the transpose of the 
vector on the left of the matrix, v'A. That does make sense and it has an 
answer (see exercise, below). 


e Linear Transformation. Multiplying a vector by a matrix produces another 
vector. It is our first example of something called a linear transformation, 
which is a special kind of mapping that sends vectors to vectors. Our next 
lecture covers linear transformations in depth. 


76 


[Exercise. Using the v and A from the last exercise, compute the product v‘A. 


[Exercise. Using the same A as above, let the vector w = (—1, —2.5%, 1)’ and 


— compute the product Aw, then 


— compute A(v + w), and finally 


— compare A(v + w) with Av + Aw. Is this a coincidence? | 


3.4 Matrix Transpose 


We briefly covered the transpose, v', of a vector v. This is an action that converts 
column vectors to row vectors and vice versa, 


_\t 

(2, i, -v3) = i 
l t 
2) = 9:3) 
3 


That was a special case of the more general operation, namely, taking the transpose 
of an entire matriz. The transpose operation creates a new matrix whose rows are 
the columns of the original matrix (and whose columns are the rows of the original). 
More concisely, if A is the name of our original n x m matrix, its transpose, A’, is the 
m x n matrix defined by 


(Aw)? = (Ax). 


In other words, the element in position kl of the original is moved to position lk of 
the transpose. 


Examples: 

TF Bea Paves 

. 2 6i 10 

5 6 7 8 Sea 

9 10 ll 12 Gap 

1 2 0 -i\' De 

Db =i 0 

at fA = ige a 

5 0 5 0 oe oe 


[Exercise. Make up two matrices, one square and one not square, and show the 
transpose of each.| 


77 


3.5 Matrix Addition and Scalar Multiplication 


Matrices can be added (component-wise) and multiplied by a scalar (apply the scalar 
to all nm elements in the matrix). I’m going to let you be the authors of this section 
in two short exercises. 


[Exercise. Make these definitions precise using a formula and give an example of 
each in a 3 x 3 case.| 


[Exercise. Show that matrix multiplication is distributive over addition and both 
associative and commutative in combination with scalar multiplication, i-e., 
A (cB, + Bp) = C (AB,) + (AB2) 


3.6 Identity Matrix and Zero Matrix 


Zero. A matrix whose elements are all 0 is called a zero matrix, and can be written 
as 0 or (0), eg., 


oo Oo 
oo oO 

© 

B 

“—— 

j=) 

7 

| 
oOo Oo 
oOo Oo 


Clearly, when you add the zero matrix to another matrix, it does not change anything 
in the other matrix. When you multiply the zero matrix by another matrix, including 
a vector, it will squash it to all Os. 


(O0)A = (0) and (0)v = 0 


In words, the zero matrix is the additive identity for matrices. 


Identity (or Unit). A square matrix whose diagonal elements (upper-left to 
lower-right) are all 1 and whose “off-diagonals” are all 0 is called an identity matrix 
or unit matrix, and can be written variously as J, Id, 1, 1 or 1, e.g., 


—— 0 1 0 or i 
001 001 0 
000 1 


The identity matrix has the property that when you apply it to (multiply it with) 


78 


another matrix or vector, it leaves that matrix or vector unchanged, e.g., 


fk 0 i ic tea 1 2 Q —-1 
GO: 20; 0) —2 -1 1 4 a —2 -1 1 4 
0 0 1 0 5 0 5 QO 7 5 0 5 0 
00 0 1 o=4: “2 22 2 Dt 2 2 22 

i] 24° Oe OS 10.20. 0 1 2b Oo =) 
a ae a So Oe: 21> s0:. 20 _ —2 -1 1 4 

5 0 5 0 OO ak a 5 0 5 0 
ay. 2 see 2 Oe Oe ab art. 2" Set 


Notice that multiplication by a unit matriz has the same (non) effect whether it 
appears on either side of its co-multiplicand. The rule for both matrices and vectors 
is 


IM = M1 = M, 
lv = v and 
vl = vi. 


In words, the unit matrix is the multiplicative identity for matrices. 


3.7 Determinants 


Associated with every square matrix is a scalar called its determinant. There is very 
little physics, math, statistics or any other science that we can do without a working 
knowledge of the determinant. Let’s check that box now. 


3.7.1 Determinant of a 2 x 2 Matrix 


Consider any (real or complex) 2 x 2 matrix, 


a b 
aan) 
The determinant of A is defined (and written) as follows, 
a b | 


det(A) = A 


ad — bc. 


For example, 


see 


i 
Lis —5i 


[Exercise. Compute the following determinants: 


1 -—2 = & 
Bal = 
40 1+2 _ 2 

1-2 2 


3.7.2 Determinant of a 3 x 3 Matrix 


We'll give an explicit definition of a3 x 3 matrix and this will suggest how to proceed 
to the n x n case. 


abe 
e f| = a - ff + oft 
gh i g g 


a (minor of «) — 0 (minor of i) + ¢ (minor of °) 


(Sorry, I had to use the variable name i, not for the /—1, but to mean the 3-3 element 
of the matrix, since I ran out of reasonable letters.) The latter defines the minors of 
a matrix element to be the determinant of the smaller matrix constructed by crossing 
out that element’s row and column (See Figure 3.3.) 


minor of b 


Figure 3.3: Minor of a matrix element 


[Exercise. Compute the following determinants: 


1 3 
2 1-2}; = ? 
-1 3 4 
1 0 0 
0 4 14+71| = ? 
O l=—¢ 2 


Compare the last answer with the last 2 x 2 determinant exercise answer. Any 
thoughts? ] 


80 


3.7.3. Determinant of an n x n Matrix 


The 3x3 definition tells us to proceed recursively for any square matrix of size n. 
We define its determinant as an alternating sum of the first row elements times their 
MINors, 


det(A) = |A| 
= Ay (minor of An) — Ajo (minor of Aun) + Aj3 (minor of Any) 


+ 


= S0(-1)" Aig (ninor of Any) é 


k=1 

I think you know what’s coming. Why row 1? No reason at all (except that every 
square matrix has a first row). In the definition above we would say that we expanded 
the determinant along the first row, but we could have expanded it along any row — 
or any column, for that matter — and gotten the same answer. 


However, there is one detail that has to be adjusted if we expand along some other 
row (or column). The expression 


(-1)""? 


has to be changed if we expand along the jth row (column) rather than the first row 


(column). The 1 above becomes j, 
(1) 


giving the formula 
det(A) = So(-1) Aje (:ninor of Aix) ; 
k=1 
good for any row (column) j. 


[Exercise. Rewrite the definition of an n x n determinant expanded along the 
3rd column; along the kth column.| 


[Exercise. Expand one of the 2 x 2 determinants and one of the 3x 3 determinants 
in the exercises above along a different row or column and confirm that you get the 
same answer as before.| 


[Exercise. If you dare, prove that the determinant is independent of the choice 
of row/column used to expand it.| 


3.7.4 Determinants of Products 


det(AB) = det(A) det(B). 


We don’t need to prove this or do exercises, but make a mental note as we need it in 
the following sections. 


81 


3.8 Matrix Inverses 


Since we have the multiplicative identity (i-e., 1) for matrices we can ask whether, 
given an arbitrary square matrix A, the inverse of A can be found. That is, can we 
find a B such that AB = BA = 1? The answer is sometimes. Not all matrices have 
inverses. If A does have an inverse, we say that A is invertible or non- singular, and 
write its inverse as A~'. Shown in a couple different notations, just to encourage 
flexibility, a matrix inverse must satisfy (and is defined by) 


M‘1M = MM" = 1 of 
A'tA = AA = TI. 


Here is one of two “little theorems” that we’ll need when we introduce quantum 
mechanics. 


Little Inverse Theorem “A”. Jf Mv = 0 for some non-zero vector 
v, then M has no inverse. 


Proof by Contradiction. We’ll assume that M~! exists and reach a contra- 
diction. Let v be some non-zero vector that M “sends” to 0. Then, 


v = Iv = [M'*M]v = M'(Mv) = MO = 0, 


contradicting the choice of v ¥ 0. QED 


[Exercise. Prove that > can be strengthened to = in the above theorem. That 
is, prove the somewhat trickier < direction: if M does not have an inverse, there 
exists a non-zero v satisfying Mv = 0.] 


The condition that a matrix be invertible is closely related to its determinant. 


Big Inverse Theorem. A matrix is non-singular (invertible) <= its de- 
terminant # 0. 


While we don’t have to prove this, it’s actually fun and easy. In fact, we can get 
half way there in one line: 

[Exercise. Prove that M non-singular = det(M) # 0 (in a single line). Hint: 
Use the fact about determinants of products. ] 

The other direction is a consequence of a popular result in matrix equations. Since 
one or more of our quantum algorithms will refer to this result, we’ll devote the next 
section to it and do an example. Before we leave this section, though, I need to place 
into evidence, our second “little theorem.” 


Little Inverse Theorem “B”. Jf Mv = 0 for some non- zero vector v, then 
det(M) = 0. 

Proof. Little Theorem A tells us that the hypothesis implies M has no inverse, 
and the subsequent Big Theorem tells us that this forces det(M) ¥ 0. QED 


82 


3.9 Matrix Equations and Cramer’s Rule 


3.9.1 Systems of Linear Equations 


A system of simultaneous linear equations with n unknowns is a set of equations in 
variables 11, %2, ..., £n, which only involves sums first order (linear) terms of each 
variable, e.g., 


4x9 — 45 + 3.221 19 + 0X3 
ry tf 22 + 43 + U4 + 5 = 1 
10x4 = 2221 85 — T2 + Xy 


To solve the system uniquely, one needs to have exactly at least as many equations as 
there are unknowns. So the above system does not have a unique solution (although 
you can find some simpler relationships between the variables if you try). Even if you 
do have exactly the same number of equations as unknowns, there still may not be a 
unique solution since one of the equations might — to cite one possibility — be a mere 
multiple of one of the others, adding no new information. Each equation has to add 
new information — it must be independent of all the others. 


Matrix Equations 


We can express this system of equations concisely using the language of matrix mul- 
tiplication, 


Ly 
3.2 4 —5 QO -1l LQ 19 
1 1 1 1 1 X3 = 1 
—23 1 O 10 O LA 85 
U5 


If we were able to get two more relationships between the variables, independent of 
these three, we would have a complete system represented by a square 5 x 5 matrix 
on the LHS. For example, 


32 4 -5 0 -1 Di 19 

s+ Cie ade. eh 1 X2 1 
—23 1 0 10 4O x3 = | 85 
2.50 7m 09 50 1 L4 0 
2r Q 83 -1 —-17 Ls 4 


Setting M = the 5 x 5 matrix on the left, v = the vector of unknowns, and c = the 
vector of constants on the right, this becomes 


Mv = c¢. 


83 


How can we leverage the language of matrices to get a solution? We want to know 
what all the x, are. That’s the same as having a vector equation in which v is all 
alone on the LHS, 


v = d 


’ 


and the d on the RHS is a vector of constants. The scent of a solution should be 
wafting in the breeze. If Mv = c were a scalar equation we would divide by M. 
Since it’s a matrix equation there are two differences: 


1. Instead of dividing by M, we multiply each side of the equation (on the left) by 
M-}. 
2. M~' may not even exist, so this only works if M is non-singular. 


If M is non-singular, and we can calculate M~', we apply bullet 1 above, 
M1Mv = Me 
Iv = M te 
v = M'te 
This leaves us with two action items. 
1. Determine whether or not / is invertible (non-singular). 


2. If it is, compute the inverse, M~!. 


We know how to do item 1; if det(V) is non-zero, it can be inverted. For item 2, 
we have to learn how to solve a system of linear equations, something we are perfectly 
situated to do given the work we have already done in this lecture. 


3.9.2 Cramer’s Rule 


Today’s technique, called Cramer’s rule, is a clean and easy way to invert a matrix, 
but not used much in practice due to its proclivity toward round-off error and poor 
computer performance. (We'll learn alternatives based on so-called normal forms and 
Gaussian elimination when we get into Simon’s and Shor’s quantum algorithms). But 
Cramer’s rule is very cute, so let’s get enlightened. 


Cramer’s Rule. A system of linear equations, 
Mv = ¢, 
(like the 5 x 5 system above) can be solved uniquely for the unknowns 
xr, <= det(M) #0. In that case each x; is given by 
det M,, 
det M ’ 


where M;, is the matrix M with its kth column replaced by the constant 
vector c (see Figure 3.4). 


Ley = 


84 


k® column 


Mi, coos PME cee Min 

Mo, cee MEME cee Moy, 
My = det on ‘ 

Mnr eee Cn ee Man 


Figure 3.4: The numerator of Cramer’s fraction 


We will not prove Cramer’s rule. You can take it on faith, look up a proof, or 
have a whack at it yourself. 


Example Application of Cramer’s Rule 


We can compute the relevant determinants of the 5 x 5 system above with the help 
of Mathematica or similar software which computes determinants for us. Or, we can 
write our own determinant calculator using very few instructions and recursion, and 
do it all ourselves. First, we check the main determinant, 


det(M) = -167077 40. Vv 


So M is non-singular = system is solvable. Now for Cramer’s numerators: 


det(M,) = -—635709 
det(Mz) = 199789 
det(M3) = 423127 
det(M,) = 21996.3 
det(Ms) = —176281 
Using this, we solve for the first unknown, 
—635709 
= —— = _ 3.80488. 
ii —167077 sic 


[Exercise. Compute the other four z;, and confirm that any one of the equations 
in the original system holds for these five values. ] 


Computing Matrix Inverses Using Cramer’s Rule 


As noted, Cramer’s rule is not very useful for writing software, but we can apply it 
to the problem of finding a matrix inverse, especially when dealing with small, 2 x 2, 


85 


matrices by hand. Say we are given such a matrix 


a Od 
w= (04). 


that we have confirmed to be non-singular by computing ad — bc. So we know it has 
an inverse, which we will temporarily write as 


The only thing we can say with certainty, at this point, is that 


we COG) = += 69) 


Our goal is to solve this matrix equation. We do so in two parts. First, we break this 
into an equation which only involves the first column of the purported inverse, 


(90) -O 


This is exactly the kind of 2-equation linear system we have already conquered. 
Cramer’s rule tells us that it has a solution (since det(M) #4 0) and the solution 


is given by 
1 0D 
7 det & ) 
aad det M 
and 
a l 
oe) 
det M 


The same moves can be used on the second column of M~!, to solve 
’ 


(ca) (a) = (i). 


[Exercise. Write down the corresponding quotients that give f and h.] 


Example. We determine the invertibility of M and, if invertible, compute its 


inverse, where 
—12 1 
jen ( 15 i) 


The determinant is 


86 


which tells us that the inverse does exist. Setting 


we solve first for the left column @r Following the procedure above, 


La 
a (j _ 1 1 


e  GetMti“‘i‘~—=CTti<C~«i 
and 
—12 1 
- aet (45 ,) _ 5 
= det M = oe 


which we don’t simplify ... yet. Continuing on to solve for f and h results in the final 


inverse matrix 
Get. ta —1/27 1/27 - Lifeat. 2 
15/27 12/27 o7 & Lo: 2 
and we can see why we did not simply the expression of g. 


[Exercise. For the above example, confirm that M M~! = Id.] 


[Exercise. Compute the inverse of 


using Cramer’s rule. Check your work.] 


[Exercise. Show that 


is non-singular, then compute its inverse using Cramer’s rule. Check your work.| 


Completing the Proof of the Big Inverse Theorem 
Remember this 


Big Inverse Theorem. A matrix is non-singular (invertible) <= its de- 
terminant # 0. 


87 


Your proved => in an exercise. Now you can do <— in another exercise. 


[Exercise. Using Cramer’s rule as a starting point, prove that 
det(M) #40 = M is non-singular . 


Hint. We just did it in our little 2 x 2 and 3 x 3 matrices. | 


88 


Chapter 4 


Hilbert Space 


4.1 Complex Vector Spaces for Quantum Comput- 
ing 


4.1.1 The Vector Nature of a Qubit 


We are gradually constructing a mathematical language that can be used to accurately 
model the physical qubit and hardware that processes it. A typical qubit is expressed 
using the symbolism 


Iv) = a@|0) + BI1). 


In the sneak peek given in the introduction I leaked the meaning of the two numbers a 
and 3. They embody the probabilistic nature of quantum bits. While someone might 
have prior knowledge that a qubit with the precise value “a|0) + 6|1)” (whatever 
that means) was sitting in a memory location, if we tried to read the value of that 
location — that is, if we measured it — we would see evidence of neither a nor 3, but 
instead observe only a classical bit; our measurement device would register one of two 
possible outcomes: “0” or “7.” Yet a@ and £ do play a role here. They tell us our 
measurement would be 


“Q” with probability |a|? and 
“1” with probability [6|?. 


Obviously, there’s a lot yet to understand about how this unpredictable behavior can 
be put to good use, but for today we move a step closer by adding the following clue: 


The qubit shown above is a vector having unit length, and the a, 0 are 
its coordinates in what we shall see is the vector space’s natural basis 


{|0), |1) }- 


In that sense, a qubit seems to correspond, if not physically at least mathematically, 
to a vector in some vector space where a and £ are two scalars of the system. 


89 


4.1.2 The Complex Nature of a Qubit 


We need to know whatever is possible to know about a hypothetical quantum circuit 
using paper and pencil so we can predict what the hardware will do before we build 
it. ’m happy to report that the mathematical language that works in this situation is 
one of the most accurate and well tested in the history of science: quantum mechanics. 
It behooves me to report that a real vector space like R? or R? doesn’t work. To make 
accurate predictions about qubits and the logic circuitry that entangles them, we'll 
need to allow a and £ to be complex. Thus, our vector space must be defined over a 
complex scalar field. 


Besides the straightforward consequences resulting from the use of complex arith- 
metic, we'll have to define an inner product that differs from the simple dot product 
of real vector spaces. At that point we’ll be working in what mathematicians call 
complex Hilbert space or simply Hilbert space. To make the correspondence between 
our math and the real world complete, we’ll have to add one last tweak: we will con- 
sider only vectors of unit length. That is, we will need to work on something called 
the projective sphere a.k.a projective Hilbert space. 


Today we learn how to manipulate vectors in a complex Hilbert space and the 
projective sphere within it. 


4.2 The Complex Vector Space, C” 


Since we are secure enough in the vocabulary of real vector spaces, there’s no need 
to dilly-dally around with 2- or 3-dimensional spaces. We can go directly to the fully 
general n-dimensional complex space, which we call C”. 


Its scalars are the complex numbers, C, and its vectors are n-tuples of complex 
numbers, 


Co 


Q 
3 
lI 


, %EC, k=Oton-1 


Cn-1 


Except for the inner product, everything else works like the real space R” with 
the plot twist that the components are complex. In fact, the natural basis is actually 
identical to R”’s basis. 


[Exercise. Prove that the same natural basis works for both R” and C”.| 


There are, however, other bases for C” which have no counterpart in R”, since 
their components can be complex. 


(Exercise. Drum up a basis for C” that has no real counterpart. Then find a 
vector in C” which is not in R” but whose coordinates relative to this basis are all 
real. ] 


90 


[Exercise. Is R” CC” as a set? As a subspace? Justify your answers.| 


4.3 The Complex Inner Product 


So, what’s wrong with R”’s dot product in the complex case? If we were to define it 
as we do in the real case, for example as in R?, 


Ly oD) 
i = 4%. + ; 
( Y1 ) ( Y2 ) pre, oie? 


we'd have a problem with lengths of vectors which we want to be > 0. Recall that 
lengths are defined by dotting a vector with itself (I’ll remind you about the exact 
details in a moment). To wit, for a complex a, 


is not necessarily real, never mind non-negative (try a vector whose components are 
all 1 +72). When it zs real, it could still be negative: 


5a 54 
(3). (8) 40 = -16 


We don’t want to have complex, imaginary or negative lengths typically, so we need 
a different definition of dot product for complex vector spaces. We define the product 


n—-1 n—-1 
a-b = So an bi = S- (ay) Dj 
k=0 k=0 


When defined in this way, some authors prefer the term inner product, reserving 
dot product for the real vector space analog (although some authors say complex dot 
product, so you have to adapt.) 


Notation. An alternative notation for the (complex) inner product is sometimes 
used, 


(a,b), 
and in quantum mechanics, you’ll always see 
(a/b) . 

Caution #1. The complex inner product is not commutative, i.e., 


(a|b) # (bla). 


91 


There is an important relationship between the two, however, namely 
(a|b)" = (bla). 


[Exercise. Show that the definition of complex inner product implies the above 
result. | 


Caution #2. The complex inner product can be defined by conjugating the b;s, 
rather than the azs, and this would produce a different result, one which is the complex 
conjugate of our defined inner product. However, physicists — and we — conjugate the 
left-hand vector’s coordinates because it produces nicer looking formulas. 


., Hee — (1 -2i 
= 5") and bee Ce 


(alb) = (14+i)*(1—2i) + (3°) 5: 
= (1—é)(1—2)) + (3)5: 
= (1-i —2i -2) + 15% 
= -1 + 12. 


Example. Let 


Then 


[Exercise. Compute (b|a) and verify that it is the complex conjugate of (a|b).| 


More Properties of the Complex Inner Product 


In our linear algebra lesson we learned that inner products are distributive. In the 
second position, this means 


(alb +b’) = (alb) + (alb’/), 


and the same is true in the first position. For the record, we should collect two more 
properties that apply to the all-important complex inner product. 


e It is linear in the second position. If c is a complex scalar, 
c(a|b) = (a|cb). 
e It is anti-linear in the first position. If c is a complex scalar, 
c(a|b) = (calb). 
Again, this asymmetry is reversed in much of mathematics and engineering. Quantum 


mechanics naturally leads to anti-linearity in the left vector position, while you will 
normally see it in the right vector position in other fields. 


92 


4.3.1 Norm and Distance 


The inner product on C” confers it with a metric, that is, a way to measure things. 
There are two important concepts that emerge: 


1. Norm. The norm of vector, now repurposed to our current complex vector 
space, is defined by 


dist(a,b) = |[b-al) = |S [by - agi? 


With the now correct definition of inner product we get the desired behavior when 
we compute a norm, 


n-1 
lial’ = (ala) = > 
k=0 
> 0. 


Since the length (norm or modulus) of a, |\al], is the non-negative square root of this 
value, once again all is well: lengths are real and non-negative. 


More Notation. You may see the modulus for a vector written in normal (not 
bold) face, as in 


a = fall. 
Another common syntax uses single, rather than double, bars, 
jal = lal. 


It’s all acceptable. 


Example (Continued). The norm (a.k.a. modulus) of a in the earlier example 


Q 
| 


lal = Vila) = V0+a* +4) + (*)3 
JV(-a(1+a + (3)3 

J/(l —i +74 4+1) + 9 

st) AOE. 


[Exercise. Compute the norm of b from that same example, above.] 


93 


Complex Orthonormality and Linear Independence 


We had a little theorem about orthonormal sets in real vector spaces. It’s just as true 
for complex vector spaces with the complex inner product. 


Theorem. /f a set of vectors {a,} is orthonormal, it is necessarily linearly inde- 
pendent. 


Proof. We’ll assume the theorem is false and arrive at a contradiction. Say {a,} 
is an orthonormal collection, yet one of them, ap, is a linear combination of the others, 
L.€., 


n-1 
ag => ) Chak - 
k=1 


By orthonormality (ag | a,) = 0 for all k 4 0, and (ap |ao) = 1, so 


(Sina) ) = SS eilan = 0, 


k=1 


1 = (ao | Ao) = ag 


a contradiction. QED 


4.3.2 Expansion Coefficients 


The dot-product trick for computing expansion coefficients along an orthonormal 
basis works in this regime, but we have to be careful due to this slight asymmetry. 
If we want to expand v along an orthonormal B = {b,}, we still “dot it” with the 
individual basis vectors. Say the (as yet unknown) coefficients of v in this basis are 


(..., Bx, ...). We compute each one using 
n-1 
(bk |v) = (» S> 8; » 
j=0 
n-1 
= B; (by |b;), 
j=0 


and since orthonormality means 
(be | bj) =  dxj, 


the last sum collapses to the desired 3;. However, we had to be careful to place our v 
on the right side of the inner product. Otherwise, we would not get the kth expansion 
coefficient, but its ——. 


[Exercise. Fill in the blank]. 


Example. In C? we expand v = (; ii along the natural basis A, 


+= wer = {0.0} 


94 


(We do this easy example because the answer is obvious: the coordinates along A 
should match the pure vector components, otherwise we have a problem with our 
technique. Let’s see ...) 


We seek 


The “dotting trick” says 


vo = (€@o|v) = (€o0)*v% + (e01)*v1 
= 11+%)+0(1-7%) = 141, 
and 
vy = (@i|v) = (e10)*¥ + (e11)*%1 
= 0(1+%) +1(1-7) = 1-1, 
x6) 


as expected (Vv). Of course that wasn’t much fun since natural basis vector compo- 
nents were real (0 and 1), thus there was nothing to conjugate. Let’s do one with a 
little crunch. 

1+% 
1-1 


— {6 6, } = V2/2 in/2/2 
= 0 a VJ2/2 ’ ~iV2/2 : 
First, we confirm that this basis is orthonormal (because the dot-product trick only 
works for orthonormal bases). 


(65) 


Example. In C? we expand the same v = ( ) along the basis B, 


(boo)* bio + (bo1)* O11 


(W2/2) (iV2/2) + (v2/2) (-iv2/2) 
= i/2-i/2 = OV 


Also, 


(boo)* boo + (b01)* bor 


= (V2/2) (v2/2) + (V2/2) (v2/2) 
TQ Tye SS ye 


lon) 
ros) 
The 
I 


95 


and 


(bi9)* big + (b11)* bi 


= (-iv2/2) (iv2/2) + (iV2/2) (—iv2/2) 
Lyi yo: Ss 


o> 
ar 
Set 
I 


which establishes orthonormality. 
We seek 


Ge 


The “dotting trick” says 


Vo = bo v) = (boo) "vo + (bo1) "v4 
= (V2/2)(1+¢) + (v2/2) (1-4) 
a ap 
and 
U = (by v) = (b10)*vo + (b11)*v1 
= (-iV2/2)(1+4) + (éV¥2/2) (1-4) 
= V2, 


Finally, we check our work. 


V2 bo + V2b, 


(8) + (02) 


G) + 
- (7) ¢ 


[Exercise. Work in C? and use the same v as above to get its coordinates relative 
to the basis C, 


c= wan = (ACT a(S} 


Before starting, demonstrate that this basis is orthonormal. 


96 


4.4 Hilbert Space 


4.4.1 Definitions 
A Hilbert Space is a real or complex vector space that 


1. has an inner product and 


2. is complete. 


You already know what an inner product is. A vector space that has an inner product 
is usually called an inner-product space. 


Completeness is different from the property that goes by the same name (unfor- 
tunately) involving vectors spanning a space. This is not that. Rather, completeness 
here means (please grab something stable and try not to hit your head) that any 
Cauchy sequence of vectors in the space converges to some vector also in the space. 
Unless you remember your advanced calculus, completeness won’t mean much, and 
it’s not a criterion that we will ever use, explicitly. However, if you want one more 
degree of explanation, read the next paragraph. 


Completeness of an Inner Product (Optional Reading) 


The inner product, as we saw, imbues the vector space with a distance function, 
dist(-,-). Once we have a distance function, we can inspect any infinite sequence 
of vectors, like { ao, a1, a2, ...}, and test whether the distance between consecutive 
vectors in that sequence, dist(az+1, a,), get small fast (i.e., obeys the Cauchy criterion 
— please research the meaning of this phrase elsewhere, if interested). If it does, 
{ a, a1, a2, ... } is called a Cauchy sequence. The vector space is complete if every 
Cauchy sequence of vectors in the space approaches a unique vector, also in the space, 
called the limit of the sequence. 


Illustration of Completeness. The completeness criterion does not require a 
vector space in order that it be satisfied. It requires only a set and a norm (or metric) 
which allows us to measure distances. So we can ask about completeness of many sets 
that are not even vector spaces. The simplest example happens to not be a vector 
space. 


Consider the interval [0, 1] in R which includes both endpoints. This set is 
complete with respect to the usual norm in R. For example, the sequence {1 — ae 
is Cauchy and converges to 1, which is, indeed, in |[0, 1]. This does not establish that 
[0, 1] is complete, but it indicates the kind of challenging sequence that might prove 
a set is not Cauchy.(See Figure 4.1) 


For a counter-example, consider the interval (0, 1) in R which does not include 
either endpoint. The same sequence, just discussed, is still in the set, but its limit, 
1, is not. So this set is not complete. (See Figure 4.2) 


oe 


0 


‘FEC EEET LIE) 
PP re 
oe? 
. 


10 20 30 40 50 


Figure 4.1: The Cauchy sequence {1 — thi has its limit in [0, 1] 


10 20 30 40 


Figure 4.2: The Cauchy sequence {1 _ thio does not have its limit in (0, 1) 


Notation 


When I want to emphasize that we’re working in a Hilbert space, Ill use the letter 


H just as I use terms like 


R? or C” to denote real or complex vector spaces. H could 


take the form of a C? or a C”, and I normally won’t specify the dimension of 1. until 
we get into tensor products. 


4.4.2 Old Friends and New Acquaintances 


Finite Dimensional Hilbert Spaces 


Before we go further, let’s give recognition to an old friend which happens to be 
a Hilbert space: R” with the usual dot-product. This Euclidean space (actually 


“these Euclidean spaces,” 


since there is one for each positive n) does(do) meet the 


completeness criterion, so it(they) are Hilbert space(s). 


As we have been discussing on this page so, too, are the complex inner-product 
spaces, C”, for each positive n. 


For us, the most important Hilbert space will be C?. 


Thus, we already know two classes of inner-product spaces that are “Hilbert”. 


98 


Infinite Dimensional Hilbert Spaces 


When physicists use the term Hilbert space, they often mean something slightly more 
exotic: function spaces. These are vector spaces whose vectors consist of sufficiently 
well-behaved functions, and whose inner product is usually defined in terms of an 
integral. 


The Space L?[a,b]. For example, all complex-valued functions, f(x), defined 
over the real interval [a, 6] and which are square-integrable, i.e., 


b 
[or a 2 


form a vector space. We can define an inner product for any two such functions, f 
and g, using 


(fl9) = / f(a)" gle) de, 


and this inner-product will give a distance and norm that satisfy the completeness 
criterion. Hilbert spaces very much like these are used to model the momentum and 
position of sub-atomic particles. 


4.4.3 Some Useful Properties of Hilbert Spaces 
We will be making implicit use of the following consequences of real and complex 


inner-product spaces as defined above, so it’s good to take a moment and meditate 
on each one. 


Triangle Inequality 


Both the real dot product of R” and the complex inner product of C” satisfy the 
triangle inequality condition: For any vectors, x, y and z, 


dist(z,z) <  dist(z,y) +. dist(y,z). 


Pictured in R?, we can see why this is called the triangle inequality. (See Figure 4.3). 


[Exercise. Pick three vectors in C? and verify that the triangle inequality is 
satisfied. Do this at least twice, once when the three vectors do not all lie on the 
same complex line, {ax | a € C} and once when all three do lie on the same line 
and y is “between” x and z.| 


Cauchy-Schwarz Inequality 


A more fundamental property of inner-product spaces is the Cauchy-Schwarz inequal- 
ity which says that any two vectors, x and y, of an inner-product space satisfy 


2 2 2 
xfy) <x ly ll. 


99 


Figure 4.3: Triangle inequality in a metric space.svg from Wikipedia 


The LHS is the absolute-value-squared of a complex scalar, while the RHS is the 
product of two vector norms (squared), each of which is necessarily a non-negative 
real. Therefore, it is an inequality between two non-negative values. In words, it says 
that the magnitude of an inner product is never more than the product of the two 
component vector magnitudes. 


The Cauchy-Schwarz inequality becomes an exact equality if and only if the two 
vectors form a linearly dependent set, i.e., one is a scalar multiple of the other. 


[Exercise. Pick two vectors in C? and verify that the Cauchy-Schwarz inequality 
is satisfied. Do this at least twice, once when the two vectors are linearly independent 
and once when the are linearly dependent.| 


4.5 Rays, Not Points 


4.5.1 Modeling Quantum Systems 


There is one final adjustment to be made if we are to accurately model quantum 
systems with these Hilbert spaces. It will simultaneously simplify your computations 
and confuse you. My hope is that there will be more “simplify” and less “confuse”. 


Everything we do computationally will indeed take place in some Hilbert space 
H = C”. In that regard, we have covered all the basic moves in this short lesson 
(minus linear transformations, which we present next time). However, in quantum 
mechanics, it is not true that every vector in H corresponds to a distinct physical 
state of our system. Rather, the situation is actually a little easier (if we don’t resist). 
First a definition: 


Definition of Ray. A ray (through the origin, 0) in a complex vector space is 
the set of all the scalar multiples of some non-zero vector, 


Ray belonging toa#O0 = {aa | ae C} 


Each possible physical state of our quantum system — a concept we will cover soon 
— will be represented by a unit vector (vector of length one) in H. Stated differently, 
any two non-zero H-vectors that differ by a complex scalar represent the same state, 


100 


so we may as well choose a vector of length one (||v|| = 1) that is on the same ray as 
either (or both) of those two; we don’t have to distinguish between any of the infinite 
number of vectors on that ray. 

This equivalence of all vectors on a given ray makes the objects in our mathemat- 
ical model not points in H, but rays through the origin of H. 


A Single Quantum State 


Figure 4.4: A “3-D” quantum state is a ray in its underlying H = C? 


If two complex n-tuples differ by a mere scalar multiple, a, they are to be consid- 
ered the same state in our state space, ie., 


b = aa_ (as H-space vectors) 
=> 
bra (as a physical state), 


to be read “a is equivalent to b,” or “a and b represent the same quantum state.” 
This identifies 


[the quantum states modeled by H | 
with 
[rays through the origin of that H = C”]. 


(See Figure 4.4 for an example of an H-state represented by its C? ray.) 


If we were to use brackets, [a], to mean “the ray represented by the n-tuple a,” 
then we could express a quantum state in H, as 


An H-“state” <> [a} = {aa|laeC} 


Example. All three of these complex ordered-pairs lie on the same ray in C?, so 
they all represent the same quantum state modeled by this two-dimensionbal H: 


3-4 
2 —10 = 

a= (;) b= ee and ¢= ) 
10 


[Exercise. Elaborate. ] 


Dividing any one of them by its norm will produce a unit vector (vector with 
modulus one) which also represents the same ray. Using the simplest representative, 


often written as 


For example, if we ended up with a as the answer to a problem, we could replace 
it by a unit vector along its ray, 


a 
llall 


a= 


(See Figure 4.5.) 


Figure 4.5: Dividing a vector by its norm yields a unit vector on the same ray 


Computing a unit length alternative to a given vector in H turns out to be of 
universal applicability in quantum mechanics, because it makes possible the compu- 
tation of probabilities for each outcome of a measurement. The fact that a vector has 
norm = 1 corresponds to the various possible measurement probabilities adding to 1 
(100% chance of getting some measurement). We'll see all this soon enough. 

Caution. Figures 4.4 and 4.5 suggest that once you know the magnitude of a, 
that will nail-it-down as a unique C? representative at that distance along the ray. 
This is far from true. Each point pictured on the ray, itself, constitutes infinitely 
many different n-tuples in C”, all differing by a factor of e’’, for some real 6. 


102 


Example (continued). We have seen that a = (2, i)’ has norm V5. A different 
C” representative for this H-space vector is (e’/°) a. We can easily see that (e7/°) a 
has the same length as a. In words, |er/ “| = | (prove it or go back and review your 
complex arithmetic module), so multiplying by e™’/°, while changing the C” vector, 
will not change its modulus (norm). Thus, that adjustment not only produces a 
different representative, it does so without changing the norm. Still, it doesn’t hurt 
to calculate norm of the product the long way, just for exercise: 


ni/6 2 em/6 2 (cos7/6 + i sinz/6) 
e a = ‘ ni/6 = : an gat 
ve i (cos7/6 + i sinz/6) 
2 cos7/6 + 2i sin7/6 
—1 sinz/6 + i cos7/6 


= + 22 
aie 
1 
2 


eo + 2i 
—-1 4+ iV3 
I did the hard part: I simplified the rotated vector (we call the act of multiplying by 
e"® for real 9 a rotation because of the geometric implication which you can imagine, 
look-up, or simply accept). All that’s left to do in this computation is calculate the 
norm and see that it is V5. 


[Exercise. Close the deal.] 


4.5.2 O is not a Quantum State 


The states we are to model correspond to unit vectors in H. We all get that, now. 
But what are the implications? 


e A state has to be a normalizable vector, which 0 is not. 0 will be the only 
vector in H that does not correspond to a physical quantum state. 


e As we already hammered home, every other vector does correspond to some 
state, but it isn’t the only vector for that state. Any other vector that is a 
scalar multiple of it represents the same state. 


e The mathematical entity that we get if we take the collection of all rays, {[a]}, as 
its “points,” is a new construct with the name: complex projective sphere. This 
new entity is, indeed, in one-to-one correspondence with the quantum states, 
but ... 


103 


e ... the complex projective sphere 7s not a vector space, so we don’t want to go 
too far in attempting to define it; any attempt to make a formal mathematical 
entity just so that it corresponds, one-to-one, with the quantum states it models 
results in a non-vector space. Among other things, there is no 0-vector in such 
a projective sphere, thus, no vector addition. 


The Drill. With this in mind, we satisfy ourselves with the following process. It 
may lack concreteness until we start working on specific problems, but it should give 
you the feel for what’s in store. 


1. Identify a unit vector, © € H, corresponding to the quantum state of our 
problem. 


2. Sometimes we will work with a scalar multiple, v= av € H =C", because it 
simplifies our computations and we know that this vector lies on the same ray 
as the original. Eventually, we'll re-normalize by dividing by a to bring it back 
to the projective sphere. 


3. Often we will apply a unitary transformation, U, directly to ¥. UV will be 
a unit vector because, by definition (upcoming lecture on linear transforma- 
tions), unitary transformations preserve distances. Thus, U keeps vectors on 
the projective sphere. 


4. In all cases, we will take care to apply valid operations to our unit “state vector,” 
making sure that we end up with an answer which is also a unit vector on the 
projective sphere. 


4.5.3. Why? 


There one question (at least) that you should ask and demand be answered. 


Why is this called a “projective sphere?” Good question. Since the states 
of our quantum system are rays in H, and we would prefer to visualize vectors as 
points, not rays, we go back to the underlying C” and project the entire ray (maybe 
collapse would be a better word) onto the surface of an n-dimensional sphere (whose 
real dimension is actually 2(n — 1), but never mind that). We are projecting all those 
representatives onto a single point on the complex n-sphere. (See Figure 4.5.) Cau- 
tion: Each point on that sphere still has infinitely many representatives impossible 
to picture due to a potential scalar factor e”, for real 0.] 

None of this is to say that scalar multiples, a.k.a. phase changes, never matter. 
When we start combining vectors in H, their relative phase will become important, 
and so we shall need to retain individual scalars associated with each component 
n-tuple. Don’t be intimidated; we’ll get to that in cautious, deliberate steps. 


104 


4.6 Almost There 


We have one last math lesson to dance through after which we will be ready to 
learn graduate level quantum mechanics (and do so without any prior knowledge of 
undergraduate quantum mechanics). This final topic is linear transformations. Rest 
up. Then attack it. 


105 


Chapter 5 


Linear Transformations 


5.1 Linear Transformations for Quantum Comput- 
ing 


5.1.1 A Concept More Fundamental Than the Matrix 


In the last lecture we completed the mathematical preliminaries needed to formally 
define a qubit, the vector object that will model a physical memory location in a 
quantum computer. Of course, we haven’t officially defined the qubit yet, but at 
least we have a pretty good idea that it will be associated with a vector on the 
projective sphere of some Hilbert space. We’re fully armed to deal with qubits when 
the day comes (very soon, I promise). 


This lesson will fill the remaining gap in your math: it provides the theory and 
computational skills needed to work with quantum logic gates, the hardware that 
processes qubits. 


The introductory remarks of some prior lessons suggested that a quantum logic 
gate will be described by a matrix. Consider the Y operator, a quantum gate that 
we'll learn about later in the course. We’ll see that Y takes one qubit in and produces 
one qubit out, 


VY ’ 


and its matrix will be described by 


Fair enough, except for one small problem. 


A matrix for a logic gate implicitly assumes that there is an underlying basis that 
is being used in its construction. Since we know that every vector space can have 
many bases, it will turn out that “the” matrix for any logic gate like Y will mutate 


106 


depending which basis we are using. For example, there is a basis in which the same 
logic gate has a different matrix, i.e., 


r= (3) 


This should disturb you. If a matrix defines a quantum logic gate, and there can be 
more than one matrix describing that gate, how can we ever know anything? 


There is a more fundamental concept than the matrix of linear algebra, that of 
the linear transformation. As you'll learn today, a linear transformation is the basis- 
independent entity that describes a logic gate. While a linear transformation’s matrix 
will change depending on which underlying basis we use to construct it, its life-giving 
linear transformation remains fixed. You can wear different clothes, but underneath 
you're still you. 


5.1.2 The Role of Linear Transformations in Quantum Com- 
puting 


There are many reasons we care about linear transformations. I'll list a few to give 
you some context, and we’ll discover more as we go. 


e Every algorithm in quantum computing will require taking a measurement. In 
quantum mechanics, a measurement is associated with a certain type of linear 
transformation. Physicists call it by the name Hermitian operator, which we'll 
define shortly. Such beasts are, at their core, linear transformations. |[Beware: 
I did not say that taking a measurement was a linear transformation; it is not. 
I said that every measurement is associated with a linear transformation.| 


e In quantum computing we’ll be replacing our old logic gates (AND, XOR, etc.) 
with a special kind of quantum gate whose basis independent entity is called a 
unitary operator. But a unitary operator is nothing more than a linear trans- 
formation which has some additional properties. 


e We often want to express the same quantum state — or qubit — in different bases. 
One way to convert the coordinates of the qubit from one basis to another is to 
subject the coordinate vector to a linear transformation. 


5.2 Definitions and Examples 


5.2.1 Actions as well as Name Changes 


Linear Transformations are the verbs of vector spaces. They can map vectors of one 
vector space, V, into a different space, W, 


T 
Sey, 


107 


or they can move vectors around, keeping them in the same space, 
T 
y —— Pb. 
They describe actions that we take on our vectors. They can move a vector by 
mapping it onto another vector. They can also be applied to vectors that don’t move 
at all. For example, we often want to expand a vector along a basis that is different 
from the one originally provided, and linear transformations help us there as well. 


5.2.2 Formal Definition of Linear Transformation 


A linear transformation, T, is a map from a vector space (the domain) into itself or 
into another vector space (the range), 


is 
y —— W (W could be V), 
that sends “input” vectors to “output” vectors, 
T(v) oO w. 


That describes the mapping aspect (Figure 5.1). However, a linear transformation 


eT be 


Figure 5.1: T mapping a vector v in R? to a w in R?® 


has the following additional properties that allow us to call the mapping linear: 


T(cv) = cT(v) and 
T (vi + v2) = T(v1) + T (v2). 


Here, c is a scalar of the domain vector space and vj, v2 are domain vectors. These 
conditions have to be satisfied for all vectors and scalars. 

Besides possibly being different spaces, the domain and range can also have dif- 
ferent dimensions. However, today, we’ll just work with linear transformation of a 
space into itself. 


A linear transformation can be interpreted many different ways. When learning 
about them, it’s best to think of them as mappings or positional changes which 


108 


convert a vector (having a certain direction and length) into a different vector (having 
a different direction and length). 


Notation. Sometimes the parentheses are omitted when applying a linear trans- 
formation to a vector: 


Tv = Ti(vy). 


Famous Linear Transformations 


Here are a few useful linear transformations, some of which refer to R" or C”, some 
to function spaces not studied much in this course. I’ve included figures for a few 
that we’ll meet later today. 

v is any vector in an n dimensional vector space on which the linear transformation 
acts. X; is the kth natural basis vector, vz is the kth coordinate of v in the natural 
basis, c is a scalar, and n is any unit vector. 


= 


= v_ (Identity) 

= 0 (Zero) 

= cv (Scale) (Figure 5.2) 

= vpxX, (Projection onto &,) (Figure 5.3) 


< 


jon) 
j]3S] Wow iw es ees es’ 


NNN Nm 


< 


~ 


< 


= 


< 


= (v-n) fi (Projection onto fh) (Figure 5.4) 


= y (Differentiation) 


s 
S 


— 


= : f(a’) dz’ (Anti-differentiation) 


b a 
8 


= Av (Multiplication by matrix of constants, A) 


wo Sg i 


Figure 5.2: A scaling transformation 


Let’s pick one and demonstrate that it is, indeed, linear. We can prove both 
conditions at the same time using the vector cv; + v2 as a starting point. I’ve 


109 


¥ 
' 
' 
' 
| 
T 
' 
Y 


T 

' 

' 

| a 

' 
--v 


Figure 5.3: Projection onto the direction Z, a.k.a. x3 


Figure 5.4: Projection onto an arbitrary direction n 


selected the projection onto an arbitrary direction, i, as the example. 


Pa(cvi + Ve) = [(evi + ve): A] A = [(cv,)-fi+vo-fi] fi 

[c (vi - fi) + vo: Ai] = [c(vi -f)| f+ [vo-H] a 
c [(vi-f)] H+ [vo- Al fi 

cPa(vi) + Pave). QED 


ll Il 
> 


We also listed an example in which 74 was defined by the multiplicative action 
of a matrix, A, consisting of scalar constants (i.e., the matrix can’t have variables or 
formulas in it). To make this precise, let A have size m xn and v be any n-dimensional 
vector. (They both have to use the same field of scalars, say R or C.) From the rules 
of matrix multiplication we can easily see that 


A(jv +w) = A(v) + A(w) and 
A(cv) = cA(v). 


Therefore, 


is linear. This is the linear transformation, 74, induced by the matrix A. We'll look 
at it more closely in a moment. 


[Exercise. Prove these two claims about matrix-multiplication.| 


110 


[Exercise. Look at these mapping of C? into itself. 


x 22 
Ty: | y +> 2y 

z 2z 

x 21x 
To: | y — V3 y 

Zz (4 — li)z 

x ro 
T3: | y > y + 22 

z z+/2 

av —Y 
TAs Y -— x 

z z 

x xy 
Ts : i] -—> Yy 

Zz os 

x 0 
Te: | y —> LYZ 

z 0 


Which are linear, which are not? Support each claim with a proof or counter example. | 


5.3 The Special Role of Bases 


Basis vectors play a powerful role in the study of linear transformations as a conse- 
quence of linearity. 


Let’s pick any basis for our domain vector space (it doesn’t have to be the natural 
basis or even be orthonormal), 


B = {b1, bo, at . 


The definition of basis means we can express any vector as a linear combination 
of the b,s (uniquely) using the appropriate scalars, 6,, from the underlying field. 
Now, choose any random vector from the space and expand it along this basis (and 
remember there is only one way to do this for every basis), 


k=1 


We now apply T to v and make use of the linearity (the sum could be infinite) to get 
Iv = T (>: Br » a S° By T (bz) 
k=1 k=1 


111 


What does this say? It tells us that if we know what 7’ does to the basis vectors, 
we know what it does to all vectors in the space. Let’s say T is some hard-to- 
determine function which is actually not known analytically (by formula), but by 
experimentation we are able to determine its action on the basis. We can extend that 
knowledge to any vector because the last result tells us that the coordinates of the 
vector combined with the known values of T on the basis are enough. In short, the 
small set of vectors 


1P (b,) gt (bz) ’ set 


completely determines 7’. 


5.3.1 Application: Rotations in Space 


We now apply the theory to a linear transformation which seems to rotate points by 
90° counter-clockwise in R?. Let’s call the rotation R, /2, Since 7/2 radians is 90°. 


The plan 


We must first believe that a rotation is linear. I leave this to you. 
[Exercise. Argue, heuristically, that R,/2 is linear. Hint: Use the equations that 
define linearity, but proceed intuitively, since you don’t yet have a formula for R,/2.| 


Next, we will use geometry to easily find the action of T’ on the two standard basis 
vectors, {%, ¥}. Finally, we'll extend that to all of R?, using linearity. Now for the 
details. 


The Details 


Figures 5.5 and 5.6 show the result of the rotation on the two basis vectors. 


R_(%) =¥ \ 


Figure 5.5: Rotation of ¥ counter-clockwise by 7/2 


These figures suggest that, in R?, 


Ryrjo(k&) = ¥ and 
Rx/2(¥) x; 


| 
* 


Figure 5.6: Rotation of ¥ counter-clockwise by 7/2 


therefore, for any v, with natural basis coordinates (v,, vy)’, linearity allows us to 
write 
Rr/2 (v) = Rr/2 (Uz x = Vy y) 
= Ug Ry/2 (x) + Vy Ry/2 (Vy) 
= UzyY — yx. 
From knowledge of the linear transformation on the basis alone, we were able to derive 
a formula applicable to the entire space. Stated in column vector form, 


reals) = (i): 


and we have our formula for all space (assuming the natural basis coordinates). It’s 
that easy. 


[Exercise. Develop the formula for a counter-clockwise rotation (again in R?) 
through an arbitrary angle, 0. Show your derivation based on its effect on the natural 
basis vectors.] 


[Exercise. What is the formula for a rotation, R,,,/2, about the z-axis in R3 
through a 90° angle, counter-clockwise when looking down from the positive z-axis. 
Show your derivation based on its effect on the natural basis vectors, {X, ¥, Z}.| 


5.4 The Matrix of a Linear Transformation 


5.4.1 From Matrix to Linear Transformation 


You showed in one of your exercises above that any matrix A (containing scalar 
constants) induces a linear transformation T4. Specifically, say A is a complex matrix 
of size m x n and v is a vector in C”. Then the formula 


Ta(v) = Av 


defines a mapping 


which turns out to be linear. As you see, both A, and therefore, 74, might map 
vectors into a different-sized vector space; sometimes m > n, sometimes m < n and 
sometimes m = n. 


5.4.2 From Linear Transformation to Matrix 


We can go the other way. Starting with a linear transformation, 7’, we can construct 
a matrix My, that represents it. There is a little fine print here which we'll get to in 
a moment (and you may sense it even before we get there). 

For simplicity, assume we are working in a vector space and using some natural 


basis A = {a,}. (Think {%, ¥, 2} of R®.) So every vector can be expanded along A 
by 


n 
Vv = ) Ak ARK. 
k=1 


We showed that the action of T on the few vectors in basis A completely determines 
its definition on all vectors v. This happened because of linearity, 


Tv = a Qk T (ax) A 
k=1 


Let’s write the sum in a more instructive way as the formal (but not quite legal) “dot 
product” of a (row of vectors) with a (column of scalars). Shorthand for the above 
sum then becomes 


Now we expand each vector T'(a;) vertically into its coordinates relative to the same 
basis, A, and we will have a legitimate product, 


(Tai); (Tag)1 (Ta,)1 QA 
(Taj )2 (Tag)2 (Tan )2 a2 
(Pade Pass: ac. ATaset he, 


Writing a;, in place of (T’a;,,);, we have the simpler statement, 


Q11 412 *** Ain Q) 

Q21 22 ‘°° Qan Q2 
Tv = , 

Ani Gn2 *** Ann An 


114 


which reveals that 7 is nothing more than multiplication by a matrix made up of the 
constants aj,. 


Executive Summary. To get a matrix, My, for any linear transformation, T, 
form a matrix whose columns are JT’ applied to each basis vector. 


We can then multiply any vector v by the matrix Mp = (ajx) to get T(v). 
Notation 
Because this duality between matrices and linear transformations is so tight, we rarely 
bother to distinguish one from the other. If we start with a linear transformation, T, 
we just use T' as its matrix (and do away with the notation My). If we start with a 


matrix, A, we just use A as its induced linear transformation, (and do away with the 
notation T',). 


Example 


We've got the formula for a linear transformation that rotates vectors counter-clockwise 


in R’, namely, 
t _ —Y 
reals) = (2) 


To compute its matrix relative to the natural basis, {x, y}, we form 


* # 1 0 
Mica = (Rral), Rent) = (Rea(G)s Rea(Q) ) 
0 -1 
1 O/}- 
We can verify that it works by multiplying this matrix by an arbitrary vector, 


Ga) G) = Ciseon) = Gp 


[Exercise. Show that the matrix for the scaling transformation , 


as required. 


S3(v) = 3iv, 
is 
31 (0 
Mss = & a 


and verify that it works by multiplying this matrix by an arbitrary vector to recover 
the definition of $3;.] 


115 


5.4.3. Dependence of a Matrix on Basis 


When we are using an obvious and natural orthonormal basis (like {%, ¥, 2} for C?), 
everything is fairly straightforward. However, there will be times when we want a 
different basis than the one we thought, originally, was the preferred or natural. The 
rule is very strict, but simple. 


In order to enable matrix multiplication to give us the result of some linear 
T applied to a vector v, we must make sure T’ and v are both expressed 
ina common basis. In that case, w = Mr - v only makes sense if both 
vectors and all the columns of Mr are expanded along that common basis. 


In short, don’t mix bases when you are expressing vectors and linear transformations. 
And don’t forget what your underlying common basis is, especially if it’s not the 
preferred basis. 


Remember, there is always some innate definition of our vectors, independent of 
basis. Distinct from this, we have the expression of those vectors in a basis. 


Say v is a vector. If we want to express it using coordinates along the A basis on 
Sunday, then “cabaret” along the B basis all day Monday, that’s perfectly fine. They 
are all the same object, and this notation could be employed to express that fact: 


v= via = VI 


This is confusing for beginners, because they usually start with vectors in R?, which 
have innate coordinates, : , that really don’t refer to a basis. As we’ve seen, 


though, the vector happens to look the same as its expression in the natural basis, 
{X, y}, so without the extra notation, above, there’s no way to know (nor is there 
usually the need to know) whether the author is referring to the vector, or its coordi- 
nates in the natural basis. If we ever want to disambiguate the conversation, we will 
use the notation, v|, , where A is some basis we have previously described. 


The same goes for T’s matrix. If I want to describe T in a particular basis, I will 
say something like 


T= Tl, = Tl, 


and now I don’t even need to use Mr, since the |, or |, implies we are talking about 
a matrix. 


So the basis-free statement, 


w = Tv), 
can be viewed in one basis as 
wla = Tla- vila, 
and in another basis as 
wile = Tle: vie. 


Matrix of Mr in a Non-Standard Basis 


The development of our formula of a matrix My; of a transformation, T’, led to 


Mr = (re. Plas). ave. rte) ) 


when we were assuming the preferred basis, 


ne) 


A = {aj, ag, ...} = 0 ; Paes 


There was nothing special about the preferred basis in this formula; if we had any 
basis — even one that was non-orthonormal — the formula would still be 


) 


The way to see this most easily is to first note that in any basis 6, each basis vector 
b;,, when expressed in its own B-coordinates, looks exactly like the kth preferred basis 
element, i.e., 


“an = (rm). T(bs)} ae 4 Tbe) 


bi = 1 <— kth element. 


[Exercise. prove it.| 


Now we can see that the proposed T’ | applied to b; | nz &ives the correct result. Writing 


everything in the B basis (some labels omitted) and using matrix multiplication we 
have 


( Ore T(bx)| ol 1] = T(b)|.  ¥ 
0 
Example 


The transformation 


in the context of the vector space C? is going to have the preferred-basis matrix 
1 0 
Mr = dice 
. G : 


Let’s see what T looks like when expressed in the non-orthogonal basis 


c= 4) G)y 


[Exercise. Prove that this C is a basis for C?.] 


[Exercise. Prove it.| 


Using our formula, we get 


We compute each T(c;,)| , k= 1,2. 
g 


re = 2) - @) 


Everything needs to be expressed in the C basis, so we show that for this vector: 


(0) = +6) + ot) = 


so this last column vector will be the first column of our matrix. 


First up, T(c,) 
c 


Cc 


Next, T'(c2) 
c 


ro = 2() - 


We have to express this, too, in the C basis, a task that requires a modicum of algebra: 


() = 26) + eG), 


Solving this system should be no problem for you (exercise), giving, 


1-3: 
2 
= 3, 


118 


sO 


5.4.4 The Transformation in an Orthonormal Basis 


When we have an orthonormal basis, things get easy. We start our analysis by working 
in a natural, orthonormal basis, where all vectors and matrices are simply written 
down without even thinking about the basis, even though it’s always there, lurking 
behind the page. Today, we’re using A to designate the natural basis, so 


VS vlg 


1+iV2 
3924) 


would have the same coordinates on both sides of the equations, perhaps ( 


Formula for Matrix Elements Using the Inner Product Trick 


Even though we already know how to get the matrix in the preferred basis, A, let’s 
analyze it, briefly. Take the scaling transformation S j5 y2 in R? whose matrix is 


C0 vay) 


(You constructed the matrix for an S, in an earlier exercise today using the scaling 
factor c = 31, but just to summarize, we apply S\jg/. to the two vectors (1, 0)' and 
(0, 1)‘, then make the output vectors the columns of the required matrix.) Remember, 
each column is computed by listing the coordinates of T(X) and T(¥) in the natural 
basis. But for either vector, its coordinates can be expressed using the “dotting” trick 
because we have an orthonormal basis. So, 


Gi ) CO") (van)? 
Ne LG) (Be) 


This is true for any T and the preferred basis in R?. To illustrate, let’s take the upper 


left component. It’s given by the dot product 


A 


119 


Similarly, the other three are given by 


To s (¥ | T(%) ’ 
Tio = (x | T(¥) and 
Ton = {¥|TY) 


In other words, 


A 


r| = oo Heer 


To make things crystal clear, let’s rename the natural basis vectors 
A — { €1, Eo \, 
producing the very illuminating 
7| — ftu Te) _ (@:/T(é1))  (€1 | T(é2)) 
A Im Trp (@2|T(@:))  (@2|T(@2)) J 
But this formula, and the logic that led to it, would work for any orthonormal basis, 
not just A, and in any vector space, not just R?. 


Summary. The jkth matrix element for the transformation, T’,, in an orthonormal 
basis, 


B = { bi, be, ....b, } 


is given by 


sO 


T(bi)) (bi 


T(b2)) 
T(b2)) 


T| : 
B (b» 


Not only that, but we don’t even have to start with a preferred basis to express our 
T and B that are used in the formula. As long as T and B are both expressed in 
the same basis — say some third C — we can use the coordinates and matrix elements 


relative to C to compute (b; | T(bx)) and thus give us the matrix of Ty. 


Happy Note. That third basis, C, doesn’t even have to be orthonormal. As long 
as B, the desired basis in which we are seeking a new representation, is orthonormal, 
it all works. 


120 


Example 1 


Let’s represent the scaling transformation 


san = (“0 yap) 


in a basis that we encountered in a previous lesson (which was named C then, but 
we'll call B today, to make clear the application of the above formulas), 


v= om = (82). (A) 


We just plug in (intermediate B labels omitted): 


A 


Sy, 


eh ae 
ta a) 
: ()) («|()) 

(sI()) (s|()) 
7 | 


That’s surprising (or not). The matrix is the same in the B basis as it is in the A 


basis. 


1) Does it make sense? 


2) Is this going to be true for all orthonormal bases and all transformations? 


These are the kinds of questions you have to ask yourself when you are manipulating 
mathematical symbols in new and unfamiliar territory. [ll walk you through it. 


1) Does it make sense? 


iba 


Let’s look at the result of 
—3 
Sap ( 


e First we'll compute it in the A-basis and transform the resulting output vector 
to the B-basis. 


in both bases. 


e When we’ve done that, we’ll start over, but this time first convert the starting 
vector and matrix into B-basis coordinates and use those to compute the result, 
giving us an answer in terms of the B-basis. 


e We’ll compare the two answers to see if they are equal. 


By picking a somewhat random-looking vector, (—3, 10)’, and discovering that 
both T|, and T|g turn it into equivalent output vectors, we will be satisfied that the 
result makes sense; apparently it is true that the matrix for Sy5/. looks the same 
when viewed in both bases. But we haven’t tested that, so let’s do it. 


A-Basis. In the A basis we already know that the output vector must be 


oA), 


because that’s just the application of the transformation to the innate vector, which 
is the same as applying the matrix to the preferred coordinates. We transform the 


output vector into B-basis: 
_ { @ICs2’)) 
SL OIC), 


ato Ji, 8 


B-Basis. Now, do it again, but this time convert the input vector, Ga to 


e) 


B 


B-coordinates, and run it through S\j5 vale First the input vector: 

—3 

(in)? 
Cee bite 

10 7 

A —3 

»|(i0)) 

1C0)) J), 


Finally, we apply Syj5 pol pz to these B coordinates: 
ty3 2 0 72 
2 om 5 
(8) (2) 
2 B 0 2 
7 
_ 2 
2 


And ... we get the same answer. Apparently there is no disagreement, and other tests 
would give the same results, so it seems we made no mistake when we derived the 
matrix for Siz ale and got the same answer as S\5 | re 


Saye 


lo 


B 


This is not a proof, but it easy enough to do (next exercise). A mathematically- 
minded student might prefer to just do the proof and be done with it, while an 
applications-oriented person may prefer just to test it on a random vector to see if 
s/he is on the right track. 


[Exercise. Prove that for any v, you get the same result for S\/5/.(v) whether 
you use coordinates in the A basis or in the B basis, thus confirming once-and-for-all 
that the matrix of this scaling transformation is the same in both bases.| 


2) Do we always get the same matrix? 


Certainly not — otherwise would I have burdened you with a formula for computing 
the matrix of a linear transformation in an arbitrary basis? 


Example 2 


We compute the matrices for a projection transformation, 


Pas) = (ven) ay whew a= (V3), 


in the two bases, above. 


The A Matrix. I'll use the common notation introduced earlier, é;, for the kth 
natural basis vector. 


7 te es = Goa ed 
(@2|Pa(@i))  (€2| Pa(€2)) J 


Since 


(exercise: prove it), then 


Pa 


— Fifa a 2 
ge No 
(exercise: prove it). 


The B Matrix. Now use the B basis. 
a). { (:lPab)) (6|Pa) 
*h = = \ (bsfrucn) (bs/ratn) 


Remember, for this to work, both b; and Pa(b;) have to be expressed in this same 
basis, and we almost always express them in the B basis, since that’s how we know 


everything. Now, 
- 2/2 
Pa(bi) = (“ea) and 


B 


(exercise: prove it), so 


Pa 


»= Go) 


(exercise: prove it), a very different matrix than Ps 


B 


[Exercise. Using (—3, 10)‘ (or a general v) confirm (prove) that the matrices 


5.5 Some Special Linear Transformations for Quan- 
tum Mechanics 

We will be using two special types of linear transformations frequently in quantum 

computing, unitary and Hermitian operators. Each has its special properties that 


are sometimes easier to state in terms of their matrices. For that, we need to define 
matrix adjoints. 


5.5.1 The Adjoint of a Matrix 


Given complex n x m matrix, M, we can form an m Xx n matrix, M', by taking M’s 
conjugate transpose, 


Mt = (Mt‘)” or, in terms of components, 


(M') = M;- 


124 


In other words, we form the transpose of M, then take the complex conjugate of every 
element in that matrix. 


Examples. 
teow. <3 1 i 1+i 0 6 
Op 205. ahr stia = 3 eB. 274 
6 7 88 nxn V7-Ti 88 
‘ 1+i 5 
Ty. 349) 1 99 a Pape. 295 
5 05. Sf! 7 


1 yA ere 
99 en 


When either m = 1 or n = 1, the matrix is usually viewed as a vector. The adjoint 
operation turns column vectors into row vectors (while also transposing them), and 


Ce) z: (1-4, v2 +21) 
(3 + %, v2)’ - ea 


The Adjoint of a Linear Transformation 


Because linear transformations always have a matrix associated with them (once we 
have established a basis), we can carry the definition of adjoint easily over to linear 
transformations. 


Given a linear transformation, T with matrix My (in some agreed-upon basis), its 
adjoint, Tt, is the linear transformation defined by the matrix (multiplication) Mr". 
It can be stated in terms of its action on an arbitrary vector, 


Ti(v) = (Mr)'-v, forallveyV. 
[Food for Thought. This definition requires that we have a basis established, 
otherwise we can’t get a matrix. But is the adjoint of a linear transformation, in this 


definition, going to be different for different bases? Hit the CS 83A discussion forums, 
please. | 


5.5.2 Unitary Operators 


Informal Definition. A unitary operator is a linear transformation that preserves 
distances. 


In a moment, we will make this precise, but first the big news. 


125 


Quantum Logic Gates Are Unitary Operators 


Vv U Uv 


Quantum computing uses a much richer class of logical operators (or gates) than 
classical computing. Rather than the relatively small set of gates: XOR, AND, 
NAND, etc., in classical computing, quantum computers have infinitely many different 
logic gates that can be applied. On the other hand, the logical operators of quantum 
computing are of a special form not required of classical computing; they must be 
unitary, i.e., reversible. 


Definition and Examples 


Theoretical Definition. A linear transformation, U, is unitary (a.k.a. 
a unitary operator) if it preserves inner products, 1.€., 


(Uv|Uw) = (v|w), 


for all vectors v, w. 


While this is a statement about the preservation of inner products, it has many 
implications, one of which is that that distances are preserved, i.e., for all v, ||Uv|| = 
IIv |. 


Theorem. The following characterizations of unitary transformations 
are equivalent: 


i) A linear transformation is unitary if it preserves the lengths of vec- 
tors: ||Uv|| = ||v|]. 


ii) A linear transformation is unitary if it preserves inner products, 
(Uv|Uw) = (v|w). 

iii) A linear transformation is unitary if its matrix (in an orthonormal 
basis) has orthonormal rows (and columns). 


Caution. This theorem is true, in part, because inner products are positive- 
definite, i.e., v A O = ||v|| > 0 (lengths of non-zero vectors are strictly positive). 
However, we'll encounter some very important pseudo inner products that are not 
positive definite, and in those cases the three conditions will not be interchangeable. 
More on that when we introduce the classical bit and quantum bit, a fews lectures 
hence. 


A concrete example that serves as a classic specimen is a rotation, Rg, in any 
Euclidean space, real or complex. 


126 


Notation. It is common to use the letter, U, rather than 7, for a unitary operator 
(in the absence of a more specific designation, like Rg). 


Because of this theorem, we can use any of the three conditions as the definition 
of unitarity, and for practical reasons, we choose the third. 


Practical Definition #1. A linear transformation, U, is unitary (a.k.a. a 
unitary operator) if its matrix (in any orthonormal basis) has orthonormal columns 
(or equivalently orthonormal rows), i.e., U is unitary <= for any orthonormal basis, 


B= {bi} pa ? 
the column vectors of U li satisfy 
(U(bj)|U(be)) = dye. 


Note that we only need to verify this condition for a single orthonormal basis (exercise, 
below). 

The Matrix of a Unitary Operator. A Matrix, M, is called unitary if its 
adjoint is also its inverse, i.e., 


MiM = MM = 1. 


It is easily seen by looking at our practical definition (condition 3) of a unitary 
operator that an operator is unitary <= its matrix is unitary. This leads to yet 
another commonly stated, and equivalent, condition of a unitary operator. 


Practical Definition #2. A unitary operator, U, is one in which 


UU = UU = 1. 


Some Unitary Operators 


The 90° Rotations. The 7/2 counter-clockwise rotation in R?, 


mol) - (2) 


is unitary, since its matrix in the preferred basis, {X, y}, is 


OH 
Mp, 2 = (; 0 ) y) 


and it is easily seen that the columns are orthonormal. 


General § Rotations. In a previous exercise, you showed that 6 counter- 
clockwise rotation in R? was, in the standard basis, 


0 —siné 
—— Ge sin ). 


sind cosé 


12 


Dotting the two columns, we get 


Gannee = cosésinO — sindcosd = O, 


— sind cos 0 


and dotting a column with itself (we'll just do the first column to demonstrate) gives 


ean : Ga = cos?é?+sin?@ = 1, 


so, again, we get orthonormality. 


Phase Changes. A transformation which only modifies the arg of each coordi- 


nate, 
xL ef ') 
Il = ; ‘ 
ae () (oy 
e? 0 
Ilo.¢ = ( 0 eid ; 


Because this is a complex matrix, we have to apply the full inner-product machinery, 
which requires that we not forget to take conjugates. The inner product of the two 
columns is easy enough, 


i0 
((4) | (4) } —= e 8.9 + O-e8? <= 0, 


but do notice that we had to take the complex conjugate of the first vector — even 
though failing to have done so would have been an error that still gave us the right 
answer. The inner product of a column with itself (we’ll just show the first column) 


gives 
io io <_. f 
SO AS ela 


ae ee a a 


has the matrix 


and we have orthonormality. Once again, the complex conjugate was essential in the 
computation. 


Some Non-Unitary Operators 


Scaling by a Non-Unit Vector. Scaling by a complex (or real) c, with |c| £ 1 has 


the matrix 
c 0 
Se -- € 4 b) 


128 


whose columns are orthogonal (do it), but whose column vectors are not unit length, 


since 
(()[@) = cero ea 


by construction. 


Note. In our projective Hilbert spaces, such transformations don’t really exist, 
since we consider all vectors which differ by a scalar multiple to be the same entity 
(state). While this example is fine for learning, and true for non-projective Hilbert 
spaces, it doesn’t represent a real operator in quantum mechanics. 


A Projection Operator. Another example we saw a moment ago is the projec- 
tion onto a vector’s 1-dimensional subspace in R? (although this will be true of such 
a projection in R", n > 2). That was 


Pra Ge. Sheen = ee) . 


V2/2 
We only need look at Pa(v) in either of the two bases for which we computed its 


matrices to see that this is not unitary. Those are done above, but you can fill in a 
small detail: 


[Exercise. Show that both matrices for this projection operator fail to have 
orthonormal columns. 


A Couple Loose Ends 


[Exercise. Prove that, for orthonormal bases A and B, if U | 4 has orthonormal 
columns, then U | pz does, too. Thus, the matrix condition for unitarity is independent 
of orthonormal basis.| 

[Exercise. Prove that is not true that a unitary U will have a matrix with 
orthonormal columns for all bases. Do this by providing a counter-example as follows. 


1. Use the rotation in R,/2 in R? and the non-orthogonal basis 


> {()-Q)} = was 


2. Write down the matrix for R,/2 
for all bases, 


ie using the previously developed formula, true 


(Ranta), » (Beate )), | 


whose four components are, of course, unknown to us. We'll express them 


temporarily as 
ay 
Bo 


129 


Rx/2|p 


D 


3. Computing the matrix in the previous step requires a little calculation: you 
cannot use the earlier “trick” 
= (6; 


since that formula only applies to an orthonormal basis B, which D is not. 
Instead, you must solve some simple equations. To get the matrix for R,/2 


Tj 


T(bx)) , 


Ip 
find the column vectors. The first column is [R,/2 (2,0)"] | (D-coordinates 
D 


taken after we have applied R,/2). You can easily confirm that (in natural 


coordinates), 
2 0 
mal LL] = &) 
0] ly 2 


by drawing a picture. Now, get the D-coordinates of this vector by solving the 


A-coordinate equation, 
0 2 ih 
(2) = fo) + 9G) 


4. Even though this will immediately tell you that R, jal p is not orthonormal, go 
on to get the full matrix by solving for the second column. (7, 6)’, and showing 
that neither column is normalized and the two columns are not orthogonal.] 


d 


A 


for a and £3. 


5.5.3. Hermitian Operators 


While unitary operators are the linear transformations that can be used to represent 
quantum logic gates, there are types of operators associated with the measurements 
of a quantum state. These are the called Hermitian operators. 


Preview Characterization. In quantum mechanics, a Hermitian operator will 
be an operator that is associated with some observable, that is, something about 
the system, such as velocity, momentum, spin that we can imagine measuring. For 
quantum computer scientists, the observable will be state of a quantum bit. 


We will fully explore the connection between Hermitian operators and observ- 
ables in the very next lesson (quantum mechanics), but right now let’s least see the 
mathematical definition of a Hermitian operator. 


130 


Definitions and Examples 


Hermitian Matrices. A matriz, M, which is equal to its adjoint is 
called Hermitian, 1.e., 


M is Hermitian 


—— 
Mt = M 
[Exercise. Show that the matrices 
a : ; ; i ; 
ie) ©25.. “7a and 
V7-Ti 88 ae eae 
é 00 0 fn 
are Hermitian.| 
[Exercise. Explain why the matrices 
1+2 0 6 1 O 3. 6«CO0 
3 —2.5 —71 and 0 2 -1 0 
na V7—-Ti 88 aS 2 


are not Hermitian.| 


(Exercise. Explain why a Hermitian matrix has real elements along its diagonal. 


Hermitian Operators. A Hermitian operator is a linear transfor- 
mation, T, whose matrix, Mr (in any basis) is Hermitian. 


The definition of Hermitian operator implies that the basis chosen does not matter: 
either all of T’s matrices in all bases will be Hermitian or none will be. This is a fact, 
but we won’t bother proving it. 


Hermitian operators play a starring role in quantum mechanics, as we’ll see shortly. 


5.6 Enter the Quantum World 


Well you did it. After a few weeks of intense but rewarding math, you are ready to 
learn some fully caffeinated quantum mechanics. Ill walk you through it in a single 
chapter that will occupy us for about a week at which time you'll be fully a certified 
quantum “mechanic.” 


131 


Chapter 6 


The Experimental Basis of 
Quantum Computing 


6.1 The Physical Underpinning for Spin 1/2 Quan- 
tum Mechanics 


This is the first of a three-chapter lesson on quantum mechanics. For our purposes 
in CS 83A, only the second of the three — the next chapter — contains the essential 
formalism that we’ll be using in our algorithms. However, a light reading of this first 
chapter will help frame the theory that comes next. 


6.2 Physical Systems and Measurements 


6.2.1 Quantum Mechanics as a Model of Reality 


Quantum mechanics is not quantum physics. Rather, it is the collection of math- 
ematical tools used to analyze physical systems which are, to the best of anyone’s 
ability to test, known to behave according to the laws of quantum physics. 


As computer scientists, we will not concern ourselves with how the engineers 
implement the physical hardware that exhibits predictable quantum behavior (any 
more than we needed to know how they constructed a single classical bit capable of 
holding a 1 or a 0 out of beach sand in order to write classical software or algorithms). 
Certainly this is of great interest, but it won’t interfere with our ability to play our 
part in moving the field of quantum information forward. As of this writing, we don’t 
know which engineering efforts will result in the first generation of true quantum 
hardware, but our algorithms should work regardless of the specific solution they end 
up dropping at our doorstep. 


That said, a brief overview of the physics will provide some stabilizing terra firma 
to which the math can be moored. 


132 


6.2.2 The Physical System, .” 


Here’s the set-up. We have a physical system, call it Y (that’s a script “S”), and an 
apparatus that can measure some property of “%. Also, we have it on good authority 
that .Y behaves according to quantum weirdness; 100 years of experimentation has 
confirmed certain things about the behavior of .Y and its measurement outcomes. 
Here are some examples — the last few may be unfamiliar to you, but I will elaborate, 
shortly. 


e The system is a proton and the measurement is the velocity ( « momentum) of 
the proton. 


e The system is a proton and the measurement is the position of the proton. 


e The system is a hydrogen atom (one proton + one electron) and the measure- 
ment is the potential energy state of the atom. 


e The system is an electron and the measurement is the z-component magnitude 
of the electron’s spin. 


e The system is an electron and the measurement is the x-component magnitude 
of the electron’s spin. 


e The system is an electron and the measurement is the magnitude of the elec- 
tron’s spin projected onto the direction n. 


In each case, we are measuring a real number that our apparatus somehow is 
capable of detecting. In practice, the apparatus is usually measuring something related 
to our desired quantity, and we follow that with a computation to get the value of 
interest (velocity, momentum, energy, z-component of spin, etc.). 


6.2.3. Electron Spin as a Testbed for Quantum Mechanics 


One usually studies momentum and position as the relevant measurable quantities, 
especially in a first course in quantum mechanics. For us, an electron’s spin will be 
a better choice. 


A Reason to Learn Quantum Mechanics Through Spin 


The Hilbert spaces that we get if we measure momentum or position are infinite di- 
mensional vector spaces, and the corresponding linear combinations become integrals 
rather than sums. We prefer to avoid calculus in this course. For spin, however, our 
vector spaces are two dimensional — about as simple as they come. Sums work great. 


133 


A Reason Computer Scientists Need to Understand Spin 


The spin of an electron, which is known as a spin 1/2 particle, has exactly the kind 
of property that we can incorporate into our algorithms. It will have classical aspects 
that allow it to be viewed as a classical bit (0 or 1), yet it still has quantum-mechanical 
aspects that enable us process it while it is in an “in-between” state — a mizture of 0 
and 1. 


Spin Offers a Minor Challenge 


Unlike momentum or position, spin is not part of common vernacular, so we have 
to spend a moment defining it (in a way that will necessarily be incomplete and 
inaccurate since we don’t have a full 12 weeks to devote to the electromagnetism, 
classical mechanics and quantum physics required to do it justice). 


6.3. A Classical Attempt at Spin 1/2 Physics 


6.3.1 An Imperfect Picture of Spin 
The Scalar and Vector Sides of Spin 


Spin is a property that every electron possesses. Some properties like charge and 
mass are the same for all electrons, while others like position and momentum vary 
depending on the electron in question and the exact moment at which we measure. 
The spin — or more accurate term spin state — of an electron has aspects of both. 
There is an overall magnitude associated with an electron’s spin state that does not 
change. It is represented by the number 1/2, a value shared by all electrons at all 
times. But then each election can have its own unique vector orientation that varies 
from electron-to-electron or moment-to-moment. We’ll sneak up on the true quantum 
definition of these two aspects of quantum spin in steps by first by thinking of an 
electron using inaccurate but intuitive imagery, and we’ll make adjustments as we 
perform experiments that progressively force us to change our attitude. 


Electromagnetic Fields’ Use in Learning About Spin States 


Electrons appear to interact with external electromagnetic fields as if they were 
tiny, negatively charged masses rotating at a constant speed about an internal azis. 
While they truly are negatively charged and massive, no actual rotation takes place. 
Nevertheless, the math resulting from this imagery can be used as a starting point to 
make predictions about how they will behave in such a field. The predictions won’t 
be right initially, but they'll lead us to better models. 


134 


A Useful Mental Image of Spin 


We indulge our desire to apply classical physics and — with an understanding that 
it’s not necessarily true — consider the electron to be a rotating, charged body. Such 
an assumption would imbue every electron with an intrinsic angular momentum that 
we call spin. If you have not studied basic physics, you can imagine a spinning top; it 
has a certain mass distribution, spins at a certain rate (frequency) and its rotational 
axis has a certain direction. Combining these three things into a single vector, we 
end up defining the angular momentum of the top or, in our case, the spin of the 


electron. (See Figure 6.1.) 


Figure 6.1: Classical angular momentum 


6.3.2 A Naive Quantitative Definition of Electron Spin 


Translating our imperfect model into the language of math, we initially define an 
electron’s spin (state) to be the vector S which embodies two ideas, 


1. first, its quantity of angular momentum (how “heavy” it is combined with how 
fast it’s rotating), which I will call S, and 


2. second, the orientation (or direction) of its imagined rotational axis, which I 
will call ng or sometimes just S. 


The first entity, S, is a scalar. The second, ng, can be represented by a unit vector 
that points in the direction of the rotational axis (where we adjudicate up vs. down 
by a “right hand rule,” which I will let you recall from any one of your early math 
classes). (See Figure 6.2.) 


So, the total spin vector will be written 


135 


HIM 


Figure 6.2: A classical idea for spin: A 3-D direction and a scalar magnitude 


and we can break it into the two aspects, its scalar magnitude, 


S = |S) = 82+ 8? + S2, 
and a unit vector that embodies only its orientation (direction) 
‘ A S 
ng = S = aaa 
IS] 


S, the spin magnitude, is the same for all electrons under all conditions. For the 
record, its value is v3 h, where fi is a tiny number known as Plank’s constant. (In 
street terminology that’s “spin 1/2.”) After today, we won’t rely on the exact expres- 
sion, but for now we will keep it on the books by making explicit the relationship 


2 


The constancy of its magnitude leaves the electron’s spin orientation, S, as the only 
spin-related entity that can change from moment-to-moment or electron-to-electron. 


6.3.3. Spherical Representation 


Figure 6.3: Polar and azimuthal angles for the (unit) spin direction 


You may have noticed that we don’t need all three components nz, ny and nz. 
Since Ng is unit length, the third can be derived from the other two. [Exercise. 
How?| A common way to express spin direction using only two real numbers is 
through the so-called polar and azimuthal angles, 0 and ¢ (See Figure 6.3). 


136 


In fact, Spherical coordinates provide an alternate means of expressing any vector 
using these two angles plus the vector’s length, r. In this language a vector can 
be written (r, 6, @)spn, vather than the usual (x,y,z). (The subscript “Sph” is 
usually not shown if the context makes clear we are using spherical coordinates.) For 
example, 


Sph 


In the language of spherical coordinates the vector fig is written (1, 6, ¢)sph, where 
the first coordinate is always 1 because ng has unit length. The two remaining 
coordinates are the ones we just defined, the angles depicted in Figure 6.3. 


6 and ¢ will be important alternatives to the Euclidean coordinates (nz, ny, Nz), 
especially as we study the Bloch sphere, density matrices and mized states, topics in 
the next quantum computing course, CS 83B. 


6.4 Refining Our Model: Experiment #1 


We proceed along classical grounds and, with the aid of some experimental physicists, 
design an experiment. 


6.4.1 The Experiment 


We prepare a bunch of electrons in equiprobable random states, take some mea- 
surements of their spin states and compare the result with what we would expect 
classically. Here are the details: 


1. The States. Let’s instruct our experimental physicists to prepare an electron 
soup: billions upon billions of electrons in completely random spin orientations. 


No one direction (or range of directions) should be more represented than any 
other. 


N , i” 
— 
j * J 
Figure 6.4: A soup of electrons with randomly oriented spins 
2. The Measurement. Even classically there is no obvious way to measure the 


entire 3-D vector S at once; we have to detect the three scalar components, 


137 


Figure 6.5: The z-projection of one electron’s spin 


individually. Therefore, we’ll ask the physicists to measure only the real valued 
component S,, the projection of S onto the z-axis. They assure us they can do 
this without subjecting the electrons to any net forces that would (classically) 
modify the z-component of S. 


To aid the visualization, we imagine measuring each electron one-at-a-time and 


noting the z-component of spin after each “trial.” (See Figure 6.5.) 


. The Classical Expectation. The results are easy to predict classically since 
the length of the spin is fixed at Bh, and the electrons are oriented randomly; 


we expect S, to vary between +3 h and _¥3 h. For example, in one extreme 
case, we could find 


+ an electron whose spin is oriented “straight up,” i.e., in the positive z- 
direction with S, = +3 h possessing 100% of the vector’s length. 


Similar logic implies the detection of other electrons 
{ pointing straight down (S, = —¥8 h), 
— lying entirely in the x-y plane (S, = 0), 
/* pointing mostly up (S, = +.8h), 
\, slightly down (S, = —.1h), 


etc. 


Classical 
Range of S, 


Figure 6.6: The Classical range of z-projection of spin 


138 


6.4.2 The Actual Results 


We get no such cooperation from nature. In fact, we only — and always — get one of two 
z-values of spin: S, = +h/2 and S, = —h/2. Furthermore, the two readings appear 
to be somewhat random, occurring with about equal likelihood and no pattern: 


Only two 
values of S, 


Figure 6.7: The measurements force S, to “snap” into one of two values. 


6.4.3 The Quantum Reality 


There are two surprises here, each revealing its own quantum truth that we will be 
forced to accept. 


e Surprise #1. There are infinitely many quantum spin states available for elec- 
trons to secretly experience. Yet when we measure the z-component of spin, the 
uncooperative particle always reports that this value is either (+2) or (—4), 
each choice occurring with about equal likelihood. We will call the z-component 
of spin the observable S, (“observable” indicating that we can measure it) and 
accept from the vast body of experimental evidence that measuring the observ- 
able S, forces the spin state to collapse such that its S, “snaps” to either one 
of the two allowable values called eigenvalues of the observable S,. We'll call 
the (+4) outcome the +z outcome (or just the (+) outcome) and the (—4) 
outcome the —z (or the (—)) outcome. 


e Surprise #2. Even if we can somehow accept the collapse of the infinity of 
random states into the two measurable ones, we cannot help but wonder why the 
electron’s projection onto the z-axis is not the entire length of the vector, that 
is, either straight up at (+8) A or straight down at (-4) h. The electron 
stubbornly wants to give us only a fraction of that amount, ~ 58%. This 
corresponds to two groups. The “up group” which forms the angle 6 55° 


139 


(0.955 rad) with the positive z-axis and “down group” which forms that same 
55° angle with the negative z-axis. The explanation for not being able to get 
a measurement that has the full length (2) h is hard to describe without a 
more complete study of quantum mechanics. Briefly it is due to the Heisenberg 
uncertainty principle. If the spin were to collapse to a state that was any closer 
to the vertical +z-axis, we would have too much simultaneous knowledge about 
its x- and y- components (too close to 0) and its z-component (too close to 
(+4) h). (See Figure 6.8.) This would violate Heisenberg, which requires 
the combined variation of these observables be larger than a fixed constant. 


Therefore, S, must give up some of its claim on the full spin magnitude, (3) h, 


Too much 
info about 
S,, S,and S, 


Figure 6.8: Near “vertical” spin measurements give illegally accurate knowledge of 
Da, Oy and. 53. 


resulting in a shorter S,, specifically sh. 


6.4.4 A Follow-Up to Experiment #1 


Before we move on to a new experimental set-up, we do one more test. The physicists 
tell us that a side-effect of the experimental design was that the electrons which 
measured a +z are now separated from those that measured —z, and without building 
any new equipment, we can subject each group to a follow-up S, measurement. The 
follow-up test, like the original, will not add any forces or torques that could change 
the S (although we are now beginning to wonder, given the results of the first set of 
trials). (See Figure 6.9.) 


6.4.5 Results of the Follow-Up 


We are reassured by the results. All of the electrons in the (+) group faithfully 
measure +z, whether done immediately after the first test or after a period of time. 
Similarly, the S', value of the (—) group is unchanged; those rascals continue in their 
down-z orientation. We have apparently created two special states for the electrons 
which are distinguishable and verifiable. (See Figure 6.10.) 


140 


Figure 6.10: Results of follow-up measurements of S, on the +z (|+)) and —z (|—)) 
groups. 


|+) and |—) States. We give a name to the state of the electrons in the (+) 
group: we call it the |+), state (or simply the |+) state, since we consider the z-axis 
to be the preferred axis in which to project the spin). We say that the (—) group is 
in the |—), (or just the |—)) state. Verbally, these two states are pronounced “plus 
ket” and “minus ket.” 


6.4.6 Quantum Mechanics Lesson #1 


Either by intuition or a careful study of the experimental record, we believe there 
are infinitely many spin values represented by the electrons in our random electron 
soup prepared by the physicists, yet when we measure the S, component, each state 
collapses into one of two possible outcomes. Repeated testing of S, after the first 
collapse, confirms that the collapsed state is stable: we have prepared the electrons 
in a special spin-up or spin-down state, |+) or |—). 

|+),, and |—),, States. Certainly this would also be true had we projected 
the spin vector onto the z-axis (measuring S,) or the y-axis (measuring S,). For 
example, if we had tested S,, we would have collapsed the electrons into a different 
pair of states, |+),, and |—)., (this time, we need the subscript « to distinguish it from 
our preferred z-axis states {|+) ,|—)}). 


What’s the relationship between the two z-states, {|+),|—)} and the x-states, 


141 


{|+).>l—),}? We'll need another experiment. 


6.4.7 First Adjustment to the Model 


The imagery of a spinning charged massive object has to be abandoned. Either our 
classical physics doesn’t work at this subatomic level or the mechanism of electron 
spin is not what we thought it was — or both. Otherwise, we would have observed 
a continuum of values S', values in our measurements. Yet the experiments that we 
have been doing are designed to measure spin as if it were a 3-dimensional vector, and 
the electrons are responding to something in our apparatus during the experiments, 
so while the model isn’t completely accurate, the representation of spin by a vector, 
S (or equivalently, two angular parameters, 6 and ¢, and one scalar S’) may still have 
some merit. We’re going to stop pretending that anything is really spinning, but we’ll 
stick with the vector quantity S and its magnitude S, just in case there’s some useful 
information in there. 


6.5 Refining Our Model: Experiment 72 


The output of our first experiment will serve as the input of our second. (See Fig- 
ure 6.11.) 


Sx 


4 \ 


~~” 
INNANNANAAS 


Figure 6.11: Experiment #2: |+), electrons enter an S, measurement apparatus. 


6.5.1 The Experiment 


We will take only the electrons that collapsed to |+) as input to a new apparatus. We 
observed that this was essentially half of the original group, the other half consisting 
of the electrons that collapsed into the state |—) (which we now throw away). 


Our second apparatus is going to measure the x-axis projection. Let’s repeat this 
using the same bullet-points as we did for the first experiment. 


1. The States. The input electrons are in a specific state, |+), whose z-spins al- 
ways point (as close as possible to) up. This is in contrast to the first experiment 
where the electrons were randomly oriented. 


142 


Figure 6.12: The input state for experiment # 2 


2. The Measurement. Instead of measuring S,, we measure S;. We are curious 


Figure 6.13: The x-projection of spin 


to know whether electrons that start in the z-up state have any x-projection 
preference. We direct our physicists to measure only the real valued component 
S,, the projection of S onto the x-axis. 


3. The Classical Expectation. 


Classical 
Range of S, 


Figure 6.14: Viewed from top left, the classical range of x-projection of spin. 


Clinging desperately to those classical ideas which have not been ruled out by 
the first experiment, we imagine S, and S, to be in any relative amounts that 
complement |S,|, now firmly fixed at a This would allow values for those two 


such that Vv eae criees th? = Bh, If true, prior to this second measurement 
S, would be smeared over a range of values. (See Figure 6.14.) 


143 


6.5.2 The Actual Results 


As before, our classical expectations are dashed. We get one of two x-values of spin: 
Sy = +h/2 or S, = —h/2. And again the two readings occur randomly with near 
equal probability: 


Also, when we subject each output group to further S, tests, we find that after the 
first S, collapse each group is locked in its own state — as long as we only test S,. 


6.5.3 The Quantum Reality 


Our first experiment prepared us to expect the “non-classical.” Apparently, when an 
electron starts out in a |+), spin state, an S, measurement causes it to collapse into 
a state such that we only “see” one of two allowable eigenvalues of the observable Sz, 
namely +4 (the (+) outcome) and —4 (the (—) outcome). 


Despite the quantum results, we believe we have at least isolated two groups of 
electrons, both in the |+) (z-up) state, but one in the |—),, (a-“down”) state and the 
other in the |+).,, (w-“up”). (See Figure 6.15.) However, we had better test this belief. 


RAVAN 


Figure 6.15: A guess about the states of two groups after experiment #42 


6.5.4 A Follow-Up to Experiment #2 


Using the two detectors the physicists built, we hone in on one of the two output 
groups of experiment, #2, choosing (for variety) the |—),, group. We believe it to 
contain only electrons with S, = (+) and S, = (—). In case you are reading too 
fast, that’s z-up and x-down. To confirm, we run this group through a third test that 
measures S, again, fully anticipating a boring run of (+) readings. 


144 


6.5.5 Results of the Follow-Up 


Here is what we find: 


We are speechless. There are now equal numbers of z-up and z-down spins in a group 
of electrons that we initially selected from a purely z-up pool. (See Figure 6.16.) 
Furthermore, the physicists assure us that nothing in the second apparatus could 


50% 
a y +f 


“ Al | —) r iP ~ 


2 ® |  S, ~~ i) 


Figure 6.16: |—) electrons emerging from a group of |+) electrons 


have “turned the electrons” viz-a-viz the z-component — there were no net z-direction 
forces. 


6.5.6 Quantum Mechanics Lesson 72 


Even if we accept the crudeness of only two measurable states per observable, we 
still can’t force both S, and S, into specific states at the same time. Measuring S, 
destroys all information about S,. (See Figure 6.17.) 


After more experimentation, we find this to be true of any pair of distinct di- 
rections, S;, Sy, S,, or even Sg, where fi is some arbitrarily direction onto which 
we project S. There is no such thing as knowledge of three independent directional 
components such as S,, S,, and S,. The theorists will say that S, and S, (and any 
such pair of different directions) are incompatible observables. Even if we lower our 
expectations to accept only discrete (in fact binary) outcomes |+), |+),, |—),, ete. 
we still can’t know or prepare two at once. 


[Exception. If we select for |+),, (or any |+),) followed by a selection for its 
polar opposite, |—), (or |—),), there will be predictably zero electrons in the final 
output. | 


145 


BRAS 


Figure 6.17: The destruction of |+) S, information after measuring S, 


Preview 


As we begin to construct a two-dimensional Hilbert space to represent spin, we already 
have a clue about what all this might mean. Since, the |—), output of the second 
apparatus was found (by the third apparatus) to contain portions of both |+) and 
|—), it might be expressible by the equation 


ie = ele a 25) 


where c, and c_ are scalar “weights” which express how much |+) and |—) constitute 
|—),,. Furthermore, it would make sense that the scalars c, and c_ have equal mag- 
nitude if they are to reflect the observed (roughly) equal number of z-up and z-down 
spins detected by the third apparatus when testing the |—), group. 


We can push this even further. Because we are going to be working with normal- 
ized vectors (recall the projective sphere in Hilbert space?), it will turn out that their 
common magnitude will be 1/2. 


For this particular combination of vector states, 


ie = le 


In words (that we shall make precise in a few moments), the |—), vector can be 
expressed as a linear combination of the S, vectors |+) and |—). This hints at the 
idea that |+) and |—) form a basis for a very simple 2-dimensional Hilbert space, the 
foundation of all quantum computing. 


6.5.7 Second Adjustment to the Model 


Every time we measure one of the three scalar coordinates S;, S,, or S,, information 
we thought we had about the other two seems to vanish. The three dimensional vector 


146 


S is crumbling before our eyes. In its place, a new model is emerging — that of a two 
dimensional vector space whose two basis vectors appear to be the two z-spin states 
|+) and |—) which represent a quantum z-up and quantum z-down, respectively. This 
is a difficult transition to make, and I’m asking you to accept the concept without 
trying too hard to visualize it in your normal three-dimensional world view. Here are 
three counter-intuitive ideas, the seeds of which are present in the recent outcomes 
of experiment #2 and its follow-up: 


1. Rather than electron spin being modeled by classical three dimensional unit 


vectors 
Se 1 
Dy =|0 
S, Q Sph 


in a real vector space with basis {x, y, z}, we are heading toward a model 
where spin states are represented by two dimensional unit vectors 


in a complex vector space with basis { |+),, |—), }. 


2. In contrast to classical spin, where the unit vectors with z-components +1 and 
-1 are merely scalar multiples of the same basis vector Z, 


0 0 
0 = (+1)z and 0 = (-1)z 
+1 —1 


we are positing a model in which the two polar opposite z-spin states, |+) = |+) 
and |—) = |—),, are linearly independent of one another. 


Zz 


3. In even starker contrast to classical spin, where the unit vector X is linearly 
independent of Z, the experiments seem to suggest that unit vector |—),, can be 
formed by taking a linear combination of |+) and |—), specifically 


|—) [ey = |—) ; 
V2 
But don’t give up on the spherical coordinates 6 and ¢ just yet. They have a role 


to play, and when we study expectation values and the Bloch sphere, you'll see what 
that role is. Meanwhile, we have one more experiment to perform. 


6.6 Refining Our Model: Experiment #3 


6.6.1 What we Know 


We have learned that measuring one of the scalars related to spin, Sz, S, or any Sa, 
has two effects: 


147 


1. It collapses the electrons into one of two spin states for that observable, one that 
produces a reading of up (+) = (+4), and the other that produced a reading 
h 


of down (—) = (—8). : 


2. It destroys any information about the other spin axes, or in quantum-ese, about 
the other spin observables. 


(“Up/down” might be better stated as “left/right” when referring to x-coordinates, 
and “forward-backward”’ when referring to y-coordinates, but it’s safer to use the 
consistent terminology of spin-up/spin-down for any projection axis, not just the 
z-axis.) 

There is something even more subtle we can learn, and this final experiment will 
tease that extra information out of the stubborn states. 


6.6.2 The Experiment 


We can prepare a pure state that is between |+) and |—) if we are clever. We direct 
our physicists to rotate the first apparatus relative to the second apparatus by an 
angle @ counter-clockwise from the z-axis (axis of rotation being the y-axis, which 
ensures that the z-z plane is rotated into itself (see Figure 6.18). 


Figure 6.18: A spin direction with polar angle 0 from +z, represented by |w) 


Let’s call this rotated state “|q),” just so it has a name that is distinct from |+) 
and |—). 

If we only rotate by a tiny 0, we have a high dose of |+) and a small dose of |—) in 
our rotated state, |W). On the other hand, if we rotate by nearly 180° (7 radians), |7) 
would have mostly |—) and very little |+) in it. Before this lesson ends, we’ll prove 
that the right way to express the relationship between 6 and the relative amounts of 
|+) and |—) contained in |w) is 


je) = eos($) |4) + sin(S) +). 


148 


By selecting the same (+) group coming out of the first apparatus (but now tilted at 
an angle @) as input into the second apparatus, we have effectively changed our input 
states going into the second apparatus from purely |+) to purely |w). 


We now measure S,, the spin projected onto the z-axis. The exact features of 
what I just described can be stated using the earlier three-bullet format. 


1. The States. This time the input electrons are in a specific state, |w), whose 
z-spin direction forms an angle @ from the z-axis (and for specificity, whose 
spherical coordinate for the azimuthal angle, ¢, is 0). (See Figure 6.19.) 


Figure 6.19: The prepared state for experiment #3, prior to measurement 


2. The Measurement. We follow the preparation of this rotated state with an 
S, measurement. 


3. The Classical Expectation. We’ve been around the block enough to realize 
that we shouldn’t expect a classical result. If this were a purely classical situa- 
tion, the spin magnitude, veh, would lead to S, = V3 h cos (0). But we already 
know the largest S, ever “reads” is sh, so maybe we attenuate that number 
by cos@, and predict f cos(@). Those are the only two ideas we have at the 
moment. (See Figure 6.20.) 


| cosé xh? 


| cosa x 29 


Figure 6.20: Classical, or semi-classical expectation of experiment #3’s measurement 


149 


6.6.3. The Actual Results 


It is perhaps not surprising that we always read one of two z-values of spin: S, = +h/2 
and S$, = —h/2. The two readings occur somewhat randomly: 


However, closer analysis reveals that they are not equally likely. As we try different 
Os and tally the results, we get the summary shown Figure 6.21. 


Sr I+) cos? ae 100% 


§ a |—) sin? = 100% 


Figure 6.21: Probabilities of measuring |+) from starting state |W), @ from +z 


In other words, 


cos” (5) - 100% and 


sin? (5) - 100%. 


Notice how nicely this agrees with our discussion of experiment #2. There, we 
prepared a |—),, state to go into the final S, tester. |+).,, is intuitively 90° from the 
z-axis, so in that experiment our @ was 90°. That would make g = 45°, whose cosine 


and sine are both = — For these values, the formula above gives a predicted 50% 


(i.e., equal) frequency to each outcome, (+) and (—), and that’s exactly what we 
found when we measured S, starting with |—),, electrons. 


2 


C outcomes which = + 


I 
| 
| > 
d 


| outcomes which 


6.6.4 The Quantum Reality 


Apparently, no matter what quantum state our electron’s spin is in, we always mea- 
sure the magnitude projected onto an axis as th/2. We suspected this after the 
first two experiments, but now firmly believe it. The only thing we haven’t tried is 
to rotate the first apparatus into its most general orientation, one that includes a 
non-zero azimuthal ¢, but that would not change the results. 


150 


This also settles a debate that you might have been waging, mentally. Is spin, prior 
to an S, measurement, actually in some combination of the states |+) and |—)? Yes. 
Rotating the first apparatus relative to the second apparatus by a particular @ has a 
physical impact on the outcomes. Even though the electrons collapse into one of |+) 
and |—) after the measurement, there is a difference between |w) = .45|+) + .89|—) 
and |v’) = .71|+) + .71|—): The first produces 20% (+) measurements, and the 
second produces 50% (+) measurements. 


6.6.5 Quantum Mechanics Lesson #3 


If we prepare a state to be a normalized (length one) linear combination of the two 
S, states, |+) and |—), the two coefficients predict the probability of obtaining each 
of the two possible eigenvalues +4 for an S, measurement. This is the meaning of 
the coefficients of a state when expressed relative to the preferred z-basis. 


This holds for any direction. If we wish to measure the “projection of S onto 
an arbitrary direction nm” (in quotes because I’m using slightly inaccurate classical 
terminology), then we would expand our input state along the fi state vectors, 


Ib) = alt), + Bl-)a, 
and |a|* and |@|* would give the desired probabilities. 


The quantum physicist will tell us that spin of an electron relative to a pre- 
determined direction can be in any of infinitely many states. However, when we 
measure, it will collapse into one of two possible states, one up and one down. Fur- 
thermore the relative number of the up measurements vs. down measurements is 
predicted precisely by the magnitude-squared of the coefficients of the prepared state. 


Finally, we can use any two polar opposite directions in R? (which we shall learn 
correspond to two orthonormal vectors in C?) as a “basis” for defining the spin state. 
{|+), |—) }, the z-component measurements, are the most common, and have come 
to be regarded as the preferred basis, but we can use { |+),, |—), }, or any two oppo- 
sitely directed unit vectors (i.e., orientations). Such pairs are considered “alternate 
bases” for the space of spin states. In fact, as quantum computer scientists, we’ll use 
these alternate bases frequently. 


6.6.6 Third Adjustment to the Model 


We’re now close enough to an actual model of quantum spin 1/2 particles to skip 
the game of making a minor adjustment to our evolving picture. In the next lesson 
we'll add the new information about the meaning of the scalar coefficients and the 
probabilities they reflect and give the full Hilbert space description of spin 1/2 physics 
motivated by our three experiments. 


Also, we’re ready to leave the physical world, at least until we study time evolution. 
We can thank our experimental physicists and let them get back to their job of 
designing a quantum computer. We have a model to build and algorithms to design. 


151 


6.7 Onward to Formalism 


In physics the word “formalism” refers to the abstract mathematical notation neces- 
sary to scribble predictions about what a physical system such as a quantum computer 
will do if we build it. For our purposes, the formalism is what we need to know, and 
that’s the content of the next two chapters. They will provide a strict but limited set 
of rules that we can use to accurately understand and design quantum algorithms. 


Of those chapters, only the next (the second of these three) is necessary for CS 
83A, but what you have just read has prepared you with a sound intuition about 
the properties and techniques that constitute the formalism of quantum mechanics, 
especially in the case of the spin 1/2 model used in quantum computing. 


152 


Chapter 7 


Time Independent Quantum 
Mechanics 


7.1 Quantum Mechanics for Quantum Computing 


This is the second in a three-chapter introduction to quantum mechanics and the only 
chapter that is required for the first course, CS 83A. A brief and light reading of the 
prior lesson in which we introduce about a half dozen conceptual experiments will 
help give this chapter context. However, even if you skipped that on first reading, 
you should still find the following to be self-contained. 


Our goal today is twofold. 


1. We want you to master the notation used by physicists and computer scientists 
to scribble, calculate and analyze quantum algorithms and their associated logic 
circuits. 


2. We want you to be able to recognize and make practical use of the direct cor- 


respondence between the math and the physical quantum circuitry. 


By developing this knowledge, you will learn how manipulating symbols on paper 
affects the design and analysis of actual algorithms, hardware logic gates and mea- 
surements of output registers. 


7.2 The Properties of Time-Independent Quantum 
Mechanics 

Let’s do some quantum mechanics. We’ll be stating a series of properties — Ill call 

them traits — that will either be unprovable but experimentally verified postulates, or 


provable consequences of those postulates. We’ll be citing them all in our study of 
quantum computing. 


153 


In this lesson, teme will not be a variable; the physics and mathematics pertain 
to a single instant. 


7.2.1 The State Space of a Physical System 
Physical System (Again) 


Let’s define (or re-define if you read the last chapter) a physical system .Y to be some 
conceptual or actual apparatus that has a carefully controlled and limited number of 
measurable states. It’s implied that there are very few aspects of Y that can change 
or be measured, since otherwise it would be too chaotic to lend itself to analysis. 


Consider two examples. The first is studied early in an undergraduate quantum 
mechanics course; we won’t dwell on it. The second forms the basis of quantum 
computing, so we’ll be giving it considerable play time. 


A Particle’s Energy and Position in One-Dimension 


We build hardware that allows a particle (mass = m) to move freely in one dimension 
between two boundaries, say from 0 cm to 5 cm. The particle can’t get out of that 1- 
dimensional “box” but is otherwise free to roam around inside (no forces acting). We 
build an apparatus to test the particle’s energy. Using elementary quantum mechanics 
we discover that the particle’s energy measurement can only attain certain discrete 
values, Eo, £1, Eo, .... Furthermore, once we know which energy, E;, the particle 
has, we can form a probability curve that predicts the likelihood of finding the particle 
at various locations within the interval [0, 5). 


Spin: A Stern-Gerlach Apparatus 


We shoot a single silver atom through a properly oriented inhomogeneous magnetic 
field. The atom’s deflection, up or down, will depend on the atom’s overall angular 
momentum. By placing two or more of these systems in series with various angular 
orientations, we can study the spin of the outermost electron in directions S,, S,, 
Sy or general Sy. While not practical for quantum computation, this apparatus is 
exactly how the earlier spin-1/2 results were obtained. 


[Optional.| If you are curious, here are a few details about the Stern-Gerlach 
apparatus. 


e Silver Atoms Work Well. You might think this is too complex a system for 
measuring a single electron’s spin. After all, there are 47 electrons and each has 
spin not to mention a confounding orbital angular momentum from its motion 
about the nucleus. However, orbital angular momentum is net-zero due to 
statistically random paths, and the spin of the inner 46 electrons cancel (they 
are paired, one up, one down). This leaves only the spin of the outermost 


154 


electron #47 to account for the spin of the atom as a whole. (Two other facts 
recommend the use of silver atoms. First, we can’t use charged particles since 
the so-called Lorentz force would overshadow the subtle spin effects. Second, 
silver atoms are heavy enough that their deflection can be calculated based 
solely on classical equations. ) 


e We Prepare a Fixed Initial State. An atom can be prepared in a spin state 
associated with any pre-determined direction, fi, prior to subjecting it to a final 
Stern-Gerlach tester. We do this by selecting a |+) electron from a preliminary 
Stern-Gerlach S, tester then orient a second tester in an ni direction relative 
the original. 


e The Measurements and Outcomes. The deflection of the silver atom is detected 
as it hits a collector plate at the far end of the last apparatus giving us the 
measurements +4, and therefore the collapsed states |+),, |—),, etc. The 

results correspond precisely with our experiments #1, #2 and #8 discussed 

earlier. 


Stern-Gerlach is the physical system to keep in mind as you study the math that 
follows. 


7.3 The First Postulate of Quantum Mechanics 


7.3.1 Trait #1 (The State Space) 


For any physical system, A, we associate a Hilbert space, H. Each physical state in 
S corresponds to some ray in H, or stated another way, a point on the projective 
sphere (all unit vectors) of H. 


physical state € Y <> veEeuH, |vj/=1. 


The Hilbert space H is often called the state space of the system , or just the state 
space. 


Vocabulary and Notation. Physicists and quantum computer scientists alike 
express the vectors in the state space using “ket” notation. This means a vector in 
state space is written 


Iv), 


and is usually referred to as a ket. The Greek letter w is typically used to label any 
old state. As needed we will be replacing it with specific and individual labels when 
we want to differentiate two state vectors, express something known about the vector, 
or discuss a famous vector that is universally labeled. Examples we will encounter 
include 


Ja), |ue), [+), and |+) 


When studying Hilbert spaces, I mentioned that a single physical state corre- 
sponds to an infinite number of vectors, all on the same ray, so we typically choose 
a “normalized” representative having unit length. It’s the job of quantum physicists 
to describe how to match the physical states with normalized vectors and ours as 
quantum computer scientists to understand and respect the correspondence. 


7.3.2 The Fundamental State Space for Quantum Computing 


Based on our experiments involving spin-1/2 systems from the (optional) previous 
chapter, we were led to the conclusion that any spin state can be described by a 
linear combination of two special states |+) and |—). If you skipped that chapter, 
this will serve to officially define those two states. They are the natural basis kets of a 
2-dimensional complex Hilbert space, H. In other words, we construct a simple vector 
space of complex ordered pairs with the usual complex inner product and decree the 
natural basis to be the two measurable states |+) and |—) in %. Symbolically, 


a, B ec} with 


In this regime, any physical spin state |W) € % can be expressed as a normalized 
vector expanded along this natural basis using 


wm = () 


lal’ + al? 


a|+) + £6|-), where 


1. 
The length requirement reflects that physical states reside on the projective sphere of 
H. 


[Exercise. Demonstrate that {|+), |—)} is an orthonormal pair. Caution: 
Even though this may seem trivial, be sure you are using the complex, not the real, 
inner product.| 


The Orthonormality Expressions 


In the heat of a big quantum computation, basis kets will kill each other off, turning 
themselves into the scalars 0 and 1 because the last exercise says that 


— 
| 


H/+) = (-|-) = 1, and 
I-) = Cl) = 0 


— 
| 


While this doesn’t rise to the level of “trait”, memorize it. Every quantum mechanic 
relies on it. 


(Exercise. Demonstrate that the set {|+), |—) } forms a basis (the z-basis) for 
H. Hint: Even though only the projective sphere models 7, we still have to account 
for the entire expanse of H including all the vectors off the unit-sphere if we are going 
to make claims about “spanning the space.” | 


The 2z-Basis for H. 


Let’s complete a thought we started in our discussion of experiment #2 from last 
time. We did a little hand-waving to suggest 


im = 


but now we can make this official (or, if you skipped the last chapter, let this serve 
as the definition of two new kets). 


KH. = (4). 


that is, it is the vector in H whose coordinates along the z-bases are as shown. We 
may as well define the |+),, vector. It is 


[Exercise. Demonstrate that {|+),,, |—), } is an orthonormal pair.| 
[Exercise. Demonstrate that the set {|+),, |—), } forms a basis (the x-basis) 
for H.| 


7.3.3. Why Does a Projective # model Spin-1/2? 


There may be two questions (at least) that you are inclined to ask. 


1. Why do we need a complex vector space — as opposed to a real one — to represent 
spin? 


2. Why do spin states have to live on the projective sphere? Why not any point 
in H or perhaps the sphere of radius 94022? 


I can answer item 1 now (and item 2 further down the page). Obviously, there was 
nothing magical about the z-axis or the x-axis. I could have selected any direction 
in which to start my experiments at the beginning of the last lesson and then picked 
any other axis for the second apparatus. In particular, I might have selected the 


157 


same z-axis for the first measurement, but used the y-axis for the second one. Our 
interpretation of the results would then have suggested that |—) , contain equal parts 
|+) and |—), 


tp SS ely Se ee as 


and similarly for |+),,. If we were forced to use real scalars, we would have to pick the 
same two scalars $5 for cx (although we could choose which got the + and which 


got the — sign, a meaningless difference). We’d end up with 


a < & (warning: not true) , 
which would force them to be identical to the vectors |+), and |—),, perhaps the 
order of the vectors, reversed. But this can’t be true, since the x-kets and the y-kets 
can no more be identical to each other than either to the z-kets, and certainly neither 
are identical to the z-kets. (If they were, then repeated testing would never have split 
the original |+) into two equal groups, |+),, and|—),.) So there are just not enough 
real numbers to form a third pair of basis vectors in the y-direction, distinct from the 
x-basis and the z-basis. 


If we allow complex scalars, the problem goes away. We can define 


2 = 30) 


Now all three pairs are totally different orthonormal bases for H, yet each one contains 
“equal amounts” of |+) and |—). 


7.4 The Second Postulate of Quantum Mechanics 


Trait #1 seemed natural after understanding the experimental outcomes of the spin- 
1/2 apparatus. Trait #2 is going to require that we cross a small abstract bridge, 
but I promise, you have all the math necessary to get to the other side. 


7.4.1 Trait #2 (The Operator for an Observable) 


An observable quantity A (in S) corresponds to an operator — a linear transformation 
—in H. The operator for an observable is always Hermitian. 


[Note. There is a Trait #2a that goes along with this, but you don’t have the 
vocabulary yet, so we defer for the time being.| 


In case you forgot what Hermitian means, the following symbolic version of Trait 
##2, which [ll call Trait #42’, should refresh your memory. 


158 


Trait #2’ (Mathematical Version of Operator for an Observable): 


Observable A € fF 
<— 


T,: 2H — H linear and T) =T,. 


[Review. Tr = T, is the Hermitian condition. The Hermitian condition on a 
matrit M means that M’s adjoint (conjugate transpose), M, is the same as M. The 
Hermitian condition on a linear transformation means that its matrix (in any basis) 
is its own adjoint. Thus, Trait #42’ says that the matrix, 7'4, which purports to 
correspond to the observable A, must be self-adjoint. We’ll reinforce all this in the 
first example.| 

Don’t feel bad if you didn’t expect this kind of definition; nothing we did above 
suggested that there were linear transformations behind the observables S,, S,, etc. 
— never mind that they had to be Hermitian. I’m telling you now, there are and they 
do. This is an experimentally verified observation, not something we can prove, so it 
is up to the theorists to guess the operator that corresponds to a specific observable. 
Also, it is not obvious what we can do with the linear transformation of an observable, 
even if we discover what it is. All in due time. 


7.4.2 The Observable S, 


The linear transformation for S,, associated with the z-component of electron spin 
(a concept from the optional previous chapter), is represented by the matrix 


R 0 2: ELD 
0-2) —  2\0 -1)" 


Blank stares from the audience. 


Never fear. You'll soon understand why multiplication by this matrix is the op- 
erator (in #1) chosen to represent the observable S, (in the physical system .%). And 
if you did not read that chapter, just take this matrix as the definition of S,. 


Notation. Sometimes we abandon the constant h/2 and use a simpler matrix to 
which we give the name o, a.k.a. the Pauli spin matrix (in the z-direction), 


«, oft 6 
Oe SP Was saat 


Meanwhile, the least we can do is to confirm that the matrix for S, is Hermitian. 
[Exercise. Prove the matrix for S, is Hermitian.] 


We will now start referring to 
i) the observable “spin projected onto the z-axis,” 


159 


ii) its associated linear operator, and 


iii) the matrix for that operator 


all using the one name, Sy). 


Which Basis? Everything we do in this chapter — and in the entire the volume 
— assumes that vectors and matrices are represented in the z-basis, which is the 
preferred basis that emerges from the S, observable unless explicitly stated. There 
are other bases, but we’ll label things if and when we expand along them. 


As for the linear operators associated with the measurements of the other two 
major spin directions S; and S,, they turn out to be 


h/fo 1 
% = (i 9) 

hfo -i 
Sy. 5 (; ae 


both represented in terms of the z-basis (even though these operators do define their 
own bases). 


Without the factor of h/2 these, too, get the title of Pauli spin matrices (in the x- 
and y-directions, respectively). Collecting the three Pauli spin matrices for reference, 


we have 
_ fi 0 a fOr fOr 
ES NE EMP, OT NGL Ay Aaa ae a 


Tip. This also demonstrates a universal truth. Any observable expressed it its 
own basis is always a diagonal matrix with the eigenvalues appearing along that 
diagonal. Because the actual observables’ matrices are the Pauli operators with the 
factor of h/2 out front, we can see that the eigenvalues of S, do, in fact, appear along 
that matrix’s diagonal. 


7.5 The Third Postulate of Quantum Mechanics 


This trait will add some math vocabulary and a bit of quantum computational skill 
to our arsenal. 


7.5.1 Trait #3 (The Eigenvalues of an Observable) 


The only possible measurement outcomes of an observable quantity A are special real 
numbers, a1, Ag, a3,..., associated with the operator’s matrix. The special numbers 
@1, A2, a3,... are known as eigenvalues of the matrix. 


[Note to Physicists. I’m only considering finite or countable eigenvalues because 
our study of quantum computing always has only finitely many special a;,. In physics, 


160 


when measuring position or momentum, the set of eigenvalues is continuous (non- 
enumerable).| 


Obviously, we have to learn what an ezgenvalue is. Whatever it is, when we 
compute the eigenvalues of, say, the matrix S,, this trait tells us that they must be 
+h/2, since those were the values we measured when we tested the observable S_. If 
we had not already done the experiment but knew that the matrix for the observable 
was S, above, Trait #3 would allow us to predict the possible outcomes, something 
we will do now. 


7.5.2 Eigenvectors and Eigenvalues 


Given any complex or real matrix M, we can often find certain special non-zero 
vectors called eigenvectors for M. An eigenvector u (for MM) has the property that 


u # O and 


Mu = au, _ for some (possibly complex) scalar a. 


In words, applying / to the non-zero u results in a scalar multiple of u. The scalar 
multiple, a, associated with u is called an eigenvalue of M, while u is the (an) 
eigenvector belonging to a. This creates a possible collection of special eigenvector- 
eigenvalue pairs, 


n OY co 


{ Ug, *— Gk \ 
k=1 


for any matrix M. 


Note. There may be more than one eigenvector, u), u’, uw”, ..., for a given 


eigenvalue, a, but in this course, we’ll typically see the eigenvalues and eigenvectors 
uniquely paired. More on this, shortly. 


[Exercise. Give two reasons we require eigenvectors to be non-zero. One should 
be based purely on the arithmetic of any vector space, and a second should spring 
from the physical need to use projective Hilbert spaces as our quantum system models. 
Hint: Both reasons are very simple.| 


(Exercise. Show that if u is an eigenvector for M, then any scalar multiple, au, 
is also an eigenvector. | 


In light of the last exercise, when we find an eigenvector for a matrix, we consider 
all of its scalar multiples to be the same eigenvector; we only consider eigenvectors 
to be distinct if they are not scalar multiples of one another. Since we will be work- 
ing with projective Hilbert spaces, whenever we solve for an eigenvector, we always 
normalize it (divide by its magnitude) to force it onto the projective sphere. From 
there, we can also multiply by a unit-length complex number, e’’, in order to put it 
into a particularly neat form. 


Vocabulary. If an eigenvalue, a, corresponds to only one eigenvector, Ug, it is 
called a non-degenerate eigenvalue. However, if a works for two or more (linearly in- 
dependent, understood) eigenvectors, ul,, uw’, ..., it is called a degenerate eigenvalue. 


161 


There are two facts that I will state without proof. (They are easy enough to be 
exercises. ) 


e Uniqueness. Non-Degenerate eigenvalues have unique (up to scalar multiple) 
eigenvectors. Degenerate eigenvalues do not have unique eigenvectors. 


e Diagonality. When the eigenvectors of a matrix, M, form a basis for the vector 
space, we call it an ezgenbasis for the space. M, expressed as a matrix in its 
own eigenbasis, is a diagonal matrix (0 everywhere except for diagonal from 
position 1-1 to n-n). 


[Exercise. Prove one or both of these facts.| 


7.5.3 The Eigenvalues and Eigenvectors of S, 


Let’s examine M = S,. Most vectors (say, (1, 2)‘, for example) are not eigenvectors 


of S. For example, 
Re flle ON EN EE oa J 1 
DAO: Ip ay. > a ha8 sae 


t. ‘ 
for any complex scalar, a. However, the vector (1, 0) is an eigenvector, as 


hf1 0 1 - h(i 
0 a) ) = 3) 
demonstrates. It also tells us that u is the eigenvalue associated with the vector 
(1, 0)’, which is exactly what Trait #3 requires. 
[Exercise. Show that (0, 1)‘ and —4 form another eigenvector-eigenvalue pair for 
Sz.] 


This confirms that Trait #43 works for S,; we have identified the eigenvalues for 
S, and they do, indeed, represent the only measurable values of the observable S,, in 
our experiments. 


All of this results in a more informative variant of Trait #3, which I'll call Trait 
oe 

Trait #3’: The only possible measurement outcomes of an observable, A, are the 
solutions {a,} to the eigenvector-eigenvalue equation 


TAK |x) = Ak |wx) ‘ 


The values {a,} are always real, and are called the eigenvalues of the observable, 
while their corresponding kets, {\ux)} are called the eigenkets. If each eigenvalue has 
a unique eigenket associated with it, the observable is called non-degenerate. On the 
other hand, if there are two or more different eigenkets that make the equation true 


162 


for the same eigenvalue, that eigenvalue is called a degenerate eigenvalue, and the 
observable is called degenerate. 


You may be wondering why we can say that the eigenvalues of an observable are 
always real when we have mentioned that, for a general matrix operator, we can get 
complex eigenvalues. This is related to the theoretical definition of an observable 
which requires it to be of a special form that always has real eigenvalues. 


7.6 Computing Eigenvectors and Eigenvalues 


We take a short detour to acquire the skills needed to compute eigenvalues for any 
observable. 


The Eigenvalue Theorem. Given a matrix M, its eigenvalues are solutions to 
the system of simultaneous equations in the unknown X 
dett(M-—AI) = 0. 
The equation in the Eigenvalue Theorem is known as the “characteristic equation” 
for M. 


Proof. Assume that a is an eigenvalue for M whose corresponding eigenvector is 
u. Then 


Mu = au = 
Mu = alu = 
(MV — alhu = O. 


Keeping in mind that eigenvectors are always non-zero, we have shown that the matrix 
M — al maps a non-zero u into 0. But that’s the hypothesis of the Little Inverse 
Theorem “B” of our matrix lesson, so we get 


det(M-—al) = 0. 


In words, a is a solution to the characteristic equation. QED 


This tells us to solve the characteristic equation for an observable in order to 
obtain its eigenvalues. Getting the eigenvectors from the eigenvalues is more of an 
art and is best done using the special properties that your physical system -Y imbues 
to the state space H. For spin-1/2 state spaces, it is usually very easy and can be 
done by hand. 


7.6.1 The Eigenvalues and Eigenvectors of S, 
The characteristic equation is 


det (S,—AF) = 0, 


163 


which is solved like so: 


h 
cae 

2 
Of course, we knew the answer, because we did the experiments (and in fact, the 
theoreticians crafted the S, matrix based on the results of the experimentalists). Now 
comes the fun part. We want to figure out the eigenvectors for these eigenvalues. Get 


ready to do your first actual quantum mechanical calculation. 


Eigenvector for (+ h/2). The eigenvector has to satisfy 
h 
ne) = = 
yu + 5 u, 


so we view this as an equation in the unknown coordinates (expressed in the preferred 
z-basis where we have been working all along), 


0) = 30) 


—i’, = and 


This reduces to 


1Uy = Vea. 


There are two equations in two unknowns. Wrong. There are four unknowns (each 
coordinate, vz, is a complex number, defined by two real numbers). This is somewhat 
expected since we know that the solution will be a ray of vectors all differing by a 
complex scalar factor. We can solve for any one of the vectors on this ray as a first 
step. We do this by guessing that this ray has some non-zero first coordinate (and if 
we guess wrong, we would try again, the second time knowing that it must therefore 
have a non-zero second coordinate — [Exercise. Why?]. Using this guess, we can pick 
v, =1, since any non-zero first coordinate can be made to = 1 by a scalar multiple of 
the entire vector. With this we get the complex equations 


—1 U2 = 1 


wo= U2, 


revealing that v2 = 7, so 


which we must (always) normalize by projecting onto the unit (“projective”) sphere. 


The last equality is the expression of u explicitly in terms of the z-basis {|+), |—)}. 


Alternate Method. We got lucky in that once we substituted 1 for v1, we were 
able to read off vo immediately. Sometimes, the equation is messier, and we need to 
do a little work. In that case, naming the real and imaginary parts of v2 helps. 


vo = a t+ bi, 


and substituting this into the original equations containing v2, above, gives 


-i(a+ bi) = 1 
i = (a+ bt), 
or 
b-ai= 1 
7 = act bi. 


Let’s solve the second equation for a, then substitute into the first, as in 


a= i-bi = 
b- @-b)i = 1 = 
i ed En 


What does this mean? It means that we get a very agreeable second equation; the b 
disappears resulting in a true identity (a tautology to the logician). We can, therefore 
let b be anything. Again, when given a choice, choose 1. So b = 1 and substituting 
that into any of the earlier equations gives a = 0. Thus, 


Ho? eS oh Se = ee od, =: “4, 


the same result we got instantly the first time. We would then go on to normalize u 
as before. 


Wrong Guess for v,? If, however, after substituting for a and solving the first 
equation, b disappeared and produced a falsehood (like 0 = 1), then no 6 would be 
suitable. That would mean our original choice of v; = 1 was not defensible; v; could 
not have been a non-zero value. We would simply change that assumption, set v; = 0 
and go on to solve for v2 (either directly or by solving for a and b to get it). This 
time, we would be certain to get a solution. In fact, any time you end up facing a 
contradiction (3 = 4) instead of a tautology (7 = 7), then your original guess for v, 
has to be changed. Just redefine v, (if you chose 1, change it to 0) and everything 
will work out. 


165 


Too Many Solutions? (Optional Reading) In our well-behaved spin-1/2 state 
space, each eigenvalue determines a single ray in the state-space, so it only takes one 
unit eigenvector to describe it; you might say that the subspace of H spanned by the 
eigenvectors corresponding to each eigenvalue is 1-dimensional. All of its eigenvectors 
differ by scalar factor. But in other physical systems the eigenvector equation related 
to a single eigenvalue may yield too many solutions (even after accounting for the 
scalar multiples on the same ray we already know about). In other words, there 
may be multiple linearly-independent solutions to the equation for one eigenvalue. If 
so, we select an orthonormal set of eigenvectors that correspond to the degenerate 
eigenvalue, as follows. 


1. First observe (you can prove this as an easy |exercise]) that the set of all 
eigenvectors belonging to that eigenvalue form a vector subspace of the state 
space. 


2. Use basic linear algebra to find any basis for this subspace. 


3. Apply a process named “Gram-Schmidt” to replace that basis by an orthonor- 
mal basis. 


Repeat this for any eigenvalue that is degenerate. You will get an optional exercise 
that explains why we want an orthonormal basis for the ezgenspace of a degenerate 
eigenvalue. 


Eigenvector for (- h/2). Now it’s your turn. 


[Exercise. Find the eigenvector for the negative eigenvalue. | 


7.6.2 The Eigenvalues and Eigenvectors of S, 


Roll up your sleeves and do it all for the third primary direction. 
[Exercise. Compute the eigenvalues and eigenvectors for the observable S,.| 


Congratulations. You are now “doing” full-strength quantum mechanics (not 
quantum physics, but, yes, quantum mechanics). 


7.6.3. Summary of Eigenvectors and Eigenvalues for Spin-1/2 
Observables 


Spoiler Alert. These are the results of the two last exercises. 


166 


The eigenvalues and eigenvectors for S,, S;, and Sy are: 


fey f — : me — . 
c2 Do 0}? 2 1 
S. fae +—+ ey me ++ een 
oie mee ++ es ( BL ++ ee 
sal 2 Ye 2 9 \-1 


Expressed explicitly in terms of the z-basis vectors we find 


Sy. , 4 |+), 7 4 |-) 
Jt 2c Nesta) h ou, bl eS 
Sy: Bo ee <a a Pe hy. = a tae 
Oh eh eee) LA cae a eee 
Sy mt +), = Ji ; 5 I-), = ape 


We saw the x-kets and y-kets before when we were trying to make sense out of the 
50-50 split of a |—),, state into the two states |+), and |—),. Now, the expressions re- 
emerge as the result of a rigorous calculation of the eigenvectors for the observables S, 
and S,. Evidently, the eigenvectors of 5S, are the same two vectors that you showed 
(in an exercise) were an alternative orthonormal basis for H, and likewise for the 
eigenvectors of Sy. 


Using these expressions along with the distributive property of inner products, it 
is easy to show orthonormality relations like 


Merle St oy or 
el 


[Exercise. Prove the above two equalities as well as the remaining combinations 
that demonstrate that both the z-bases and y-basis are each (individually) orthonor- 
mal.| 


7.7 Observables and Orthonormal Bases 


Thus far, we have met a simple 2-dimension Hilbert space, #1, whose vectors corre- 
spond to states of a spin-1/2 physical system, 7. The correspondence is not 1-to-1, 
since an entire ray of vectors in H correspond to the same physical state, but we 
are learning to live with that by remembering that we can — and will — normalize all 
vectors whenever we want to see a proper representative of that ray on the unit (pro- 
jective) sphere. In addition, we discovered that our three most common observables 
— S,, S, and S, — correspond to operators whose eigenvalues are all +4 and whose 


167 


eigenvectors form three different 2-vector bases for the 2-dimensional H. Each of the 
bases is an orthonormal basis. 


I’d like to distill two observations and award them collectively the title of tract. 


7.7.1 Trait #4 (Real Eigenvalues and Orthonormal Eigen- 
vectors) 


An observable A in SY will always correspond to an operator 


1. whose eigenvalues, a, do, ..., are real, and 
2. whose eigenvectors Ug,, Ua, ---, form an orthonormal basis for the state space 
H. 


I did not call this trait a “postulate,” because it isn’t; you can prove these two 
properties based on Trait #42, which connects observables to Hermitian operators. 
Although we won’t spend our limited time proving it, if you are interested, try the 
next few exercises. 


Note. I wrote the trait as if all the eigenvalues were non-degenerate. It is still true, 
even for degenerate eigenvalues, although then we would have to label the eigenvectors 
more carefully. 


[Exercise. Show that a Hermitian operator’s eigenvalues are real.] 


[Exercise. Show that eigenvectors corresponding to distinct eigenvalues are or- 
thogonal.| 


(Exercise. In an optional passage, above, I mentioned that a degenerate cigen- 
value determines not a single eigenvector, but a vector subspace of eigenvectors, from 
which we can always select an orthonormal basis. Use this fact, combined with the 
last exercise to construct a complete orthonormal set of eigenvectors, including those 
that are non-degenerate (whose eigenspace is one-dimensional) and those that are 
degenerate (whose eigenspace requires multiple vectors to span it).| 


I told a small white lie a moment ago when I said that this is totally provable 
from the second postulate (Trait #2). There is one detail that I left out of the 
second postulate which is needed to prove these observations. You did not possess 
the vocabulary to understand it at the time, but now you do. In Trait #2 I said 
that the observable had to correspond to a Hilbert-space operator. What I left out 
was the following mini-trait, which I'll call 


Trait #42a (Completeness of the Eigenbasis): The eigenvectors of an observ- 
able span — and since they are orthogonal, constitute a basis for — the state space. 
Furthermore, every measurable quantity of the physical system Y corresponds to an 
observable, thus its eigenvectors can be chosen as a basis whenever convenient. 


If we had an observable whose eigenvectors turned out not to span the state 
space, we did a bad job of defining the state space and would have to go back and 


168 


figure out a better mathematical Hilbert space to model .%. Similarly, if there were 
a measurable quantity for which we could not identify a linear operator, we have not 
properly modeled .7”. 


7.7.2 Using the z- or y-Basis 


While we’ve been doggedly using the z-axis kets to act as a basis for our 2-dimensional 
state space, we know we can use the x-kets or y-kets. Let’s take the x-kets as an 
example. 


First and most easily, when we express any basis vector in its own coordinates, 
the results always look like natural basis coordinates, i.e., (0, 1)’ or (1, 0)’. This was 
an exercise back in our linear algebra lecture, but you can take a moment to digest 
it again. So, if we were to switch to using the x-basis for our state space vectors we 
would surely see that 


How would our familiar z-kets now look in this new basis? You can do this by starting 
with the expressions for the x-kets in terms of |+) and |—) that we already have, 
re, = as eal and 

V2 
|—) a |+) =e l=) 

x /2 ’ 

and solve for the z-kets in terms of the x-kets. It turns out that doing so results in 
déja vu, 


|+) = —2—— and 


I-) = 


It’s a bit of a coincidence, and this symmetry is not quite duplicated when we express 
the y-kets in terms of |+), and |—),. Pll do one, and you can do the other as an 
exercise. 

|+),, in the z-Basis. The approach is to first write down |+),, in terms of |+) and 
|—) (already known), then replace those two z-basis kets with their x-representation, 


169 


shown above. Here we go. 


ay = tie _ (Ene) +) 


: V2 v2 


>. 


_ Gt), + G-)b, 
2, 
x(i-) 2(|+), — 2t|-), 
= ; He - tly, 
ie |+).. 24 |--).. 
J/2 
[Exercise. Show that 


=)5 = V2 


[Exercise. Express |+) and |+),, in the y-basis.| 


7.7.3 General States Expressed in Alternate Bases 


Expressing |+) or |—) in a different basis, like the x-basis, is just a special case of 
something you already know about. Any vector, whose coordinates are given in the 
preferred basis, like 


lw) = alt) + al) = (§). 


has an alternate representation when its coefficients are expanded along a different 
basis, say the x-basis, 


Iv) = af Wi, + Bl), = Be 


If we have everything expressed in the preferred basis, then one day we find ourselves 
in need of all our vectors’ names in the alternate (say x) basis, how do we do it? 
Well there is one way I didn’t develop (yet) and it involves finding a simple matrix 
by which you could multiply all the preferred vectors to produce their coefficients 
expanded along the alternate basis. But we usually don’t need this matrix. Since 
we are using orthonormal bases, we can convert to the alternate coordinates directly 
using the dot-with-the-basis vector trick. Typically, we only have one or two vectors 
whose coordinates we need in this alternate basis, so we just apply that trick. 


For example, to get a state vector, |W), expressed in the x-basis, we just form the 
two inner-products, 


where the syntax 


2(+|¥) 


means we are taking the complex inner product of |w) on the right with the x-basis 
ket |+),, on the left. (Don’t forget that we have to take the Hermitian conjugate of 
the left vector for complex inner products.) 


We are implying by context that the column vector on the RHS is expressed in 
x-coordinates since that’s the whole point of the paragraph. But if we want to be 


super explicit, we could write it as 
fat lv) (+10) 
a cc) oe a): | 


with or without the long vertical line, depending on author and time-of-day. 


Showing the same thing in terms of the x-kets explicitly, we get 


) =e 


= ray 
v2? ) I+), + es 
i 7 


Notice that the coordinates of the three vectors, |+),, and |W) are expressed in the 
preferred z-basis. We can compute inner products in any orthonormal bases, and since 
we happen to know everything in the z-basis, why not? Try not to get confused. We 
are looking for the coordinates in the x-basis, so we need to “dot” with the x-basis 
vectors, but we use z-coordinates of those vectors (and the z-coordinates of state |w)) 
to compute those two scalars. 


Example. The (implied z-spin) state vector 


ti 
V6 

_ v2 
V3 


has the x-spin coordinates, 


[Exercise. Compute and simplify. ] 


iva 


7.8 The Completeness (or Closure) Relation 


7.8.1 Orthonormal Bases in Higher Dimensions 


Our casual lesson about conversion from the z-basis to the x-basis has brought us to 
one of the most computationally useful tools in quantum mechanics. It’s something 
that we use when doodling with pen and paper to work out problems and construct 
algorithms, so we don’t want to miss the opportunity to establish it formally, right 
now. We just saw that 


IW) = ght Y)e He + (1 )e I-)e > 


which was true because {|+),, |—),,} formed an orthonormal basis for our state space. 


In systems with higher dimensions we have a more general formula. Where do we 
get state spaces that have dimensions higher than two? A spin 1 system — photons 
— has 3 dimensions; a spin-3/2 system — delta particle — has 4-dimensions and later 
when we get into multi-qubit systems, we'll be taking tensor products of our humble 
2-dimensional H to form 8-dimensional or larger state spaces. And the state space 
that models position and momentum are infinite dimensional (but don’t let that scare 
you — they are actually just as easy to work with — we use use integrals instead of 
sums). 


If we have an n-dimensional state space then we would have an orthonormal basis 
for that space, say, 
u ; 
{ fx) fo 


The |u,) basis may or may not be a preferred basis — doesn’t matter. Using the 
dot-product trick we can always expand any state in that space, say |v), along the 
basis just like we did for the x-basis in 2-dimensions. Only, now, we have a larger 
sum 


lo) = So (ug |b) Jee). 


k=1 


This is a weighted sum of the u,-kets by the scalars (uz |W). There is no law that 
prevents us from placing the scalars on the right side of the vectors, as in 


Ie) = Se fer) (eta | ab). 


Now, we can take the |w) out of the sum 


pe = (>: lun) ul [w) . 


k=1 


Te 


Look at what we have. We are subjecting any state vector |W) to something that 
looks like an operator and getting that same state vector back again. In other words, 
that fancy looking operator-sum is nothing but an identity operator 1. 


(>: luz) al =, Mi 


k=1 

This simple relation, called the completeness or closure relation, can be applied by 
inserting the sum into any state equation without changing it, since it is the same as 
inserting an identity operator (an identity matrix) into an equation involving vectors. 
We'll use it a little in this course, CS 83A, and a lot in the next courses CS 83B and 
CS 83C, here at Foothill College. 


[Exercise. Explain how the sum, )°, |ux) (us| is, in fact, a linear transformation 
that can act on a vector |W). Hint: After applying it to |w) and distributing, each 
term is just an inner-product (resulting in a scalar) times a vector. Thus, you can 
analyze a simple inner product first and later take the sum, invoking the properties 
of linearity. | 


This is worthy of its own trait. 


7.8.2 Trait #5 (Closure Relation) 


Any orthonormal basis { |u,) } for our Hilbert space H satisfies the closure relation 


(>: lux) nl =. ol 


In particular, the eigenvectors of an observable will always satisfy the closure relation. 


We won’t work any examples as this will take us too far afield, and in our simple 
2-dimensional 1 there’s not much to see, but we now have it on-the-record. 


7.9 The Fourth Postulate of Quantum Mechanics 


There are two versions of this one, and we only need the simpler of the two which 
holds in the case of non-degenerate eigenvalues, the situation that we will be using for 
our spin-1/2 state spaces. (Also, you’ll note our traits are no longer in-sync with the 
postulates, an inevitable circumstance since there are more traits than postulates.) 


7.9.1 Trait #6 (Probability of Outcomes) 


If a system is in the (normalized) state |W), and this state is expanded along the 
eigenbasis { |uz) } of some observable, A, i.e., 


Ib) = Sock |e) » 
k=1 


173 


then the probability that a measurement of A will yield a non-degenerate eigenvalue 
a, (associated with the eigenvector |ug)) is \cyl”. 


Vocabulary. The expansion coefficients, c,, for state |q), are often referred to as 
amplitudes by physicists. 


In this language, the probability of obtaining a measurement outcome a, for ob- 
servable A is the magnitude-squared of the amplitude c, standing next to the eigen- 
vector |uz) associated with the outcome ax. 


The complex coordinates of the state determine the statistical outcome of repeated 
experimentation. This is about as quantum mechanical a concept as there is. It tells 
us the following. 


e If a state is a superposition (non-trivial linear combination) of two or more 
eigenkets, we cannot know an outcome of a quantum measurement with cer- 
tainty. 


e This is not a lack of knowledge about the system, but a statement about what 
it means to know everything knowable about the system, namely the full de- 
scription of the state. 


e The coefficient (or amplitude) cz gives you the probability of the outcome ag, 
namely 


P (ax) jy) = Chee = Ical, 


to be read, “when a system in state |W) is measured, its probability of outcome, 
ag, 18 the square magnitude of cry.” The observable in question is understood to 
be the one whose eigenvalues-eigenvectors are { a, © |ux) }. 


Probability Example 1 


Let’s analyze our original Experiment #2 (last optional chapter), in which we 
started with a group of electrons in the +z state, i.e., |+), and tested each of their 
spin projections along the x-axis, i.e., we measured the observable S,. In order to 
predict the probabilities of the outcome of the S, measurement using the fourth 
QM postulate — our Trait #46 — we expand the pre-measurement state along the 
eigenvector basis for S, and then examine each coefficient. The pre-measurements 
state is |+). We have previously expressed that state in terms of the S, eigenvectors, 
and the result was 

lay eel ba 

V2 


The amplitude of each of the two possible outcome states (the two eigenstates |+) , 
and |—),,) is woe This tell us that each eigenstate outcome is equally likely and, in 
fact, determined by 


eG G) 3 


174 


4) = 


while 


1 Le 1 1 
re =-B, = GG) = 3 
Notice that the two probabilities add to 1. Is this a happy coincidence? I think not. 
The first postulate of QM (our Trait #1) guarantees that we are using unit vectors 
to correspond to system states. If we had a non-unit vector that was supposed to 


represent that state, we’d need to normalize it first before attempting to compute the 
probabilities. 


Probability Example 2 


The follow-up to Experiment #2 was the most shocking to us, so we should see 
how it is predicted by Trait #6. The starting point for this measurement was the 
output of the second apparatus, specifically, the —x group: |—),. We then measured 
Sz, so we need to know the coefficients of the state |—), along the S, eigenbasis. We 
computed this already, and they are contained in 


[ps cameo le!) 
a 


The arithmetic we just did works exactly the same here, and our predictions are the 


r(--d8), - CYC) 3 


1 js 1 1 
r(e= Bi), = CAY CA) = 
Notice that, despite the amplitude’s negative signs, the probabilities still come out 
non-negative. 


ee 


and 


[Exercise. Analyze the S, measurement probabilities of an electron in the state 
|—). Be careful. This time we have a complex number to conjugate.| 


Probability Example 3 


The z-spin coordinates of a state, |W) are given by 
ti 
V6 
ave} ~ 
V3 


175 


The probability of detecting a z-UP spin is given by the coefficient (amplitude) of 


the |+), and is 
ee a a 
2 1 


6 a 
The probability of detecting an z-DOWN spin starting with that same |y) requires 
that we project that state along the x-basis, 


2 lie ag) ee oe 


However, since we only care about the x-down state, we can just compute the |—), 
coefficient, which we do using the dot-product trick. 


dt. 1+i : 
a = V2 ] _ Lee mt 1 
a ee oe ee) = 
| V12 V3 
kere 2 


3+%2 
af 19 af 19" 


Now we take the magnitude squared, 


P (=) (2) 10 5 
C. —s — — = a 
Vv 12 Vv 12 12 6 
(Exercise. Compute the —z and +2 spin probabilities for this |q) and confirm 


that they complement their respective partners that we computed above. Explain 
what I mean by “complementing their partners.” | 


[Exercise. Compute the +y and —y spin probabilities for this |q).] 


7.10 The Fifth Postulate of Quantum Mechanics 


After the first experiment of the last lesson, in which we measured the z-spin of 
random electrons and noted that the measurement caused them to split into two 
equal groups, one having all z-up states and the other having all z-down states, 
we did a follow-up. We tested S, again on each output group separately. In case 
you missed it, here’s what happened. When we re-tested the |+) group, the results 
always gave us +z readings, and when we re-tested the |—) group we always got —z 
readings. It was as if, after the first measurement had divided our electrons into two 
equal groups of +z and —z spins, any subsequent tests suggested that the two groups 
were frozen into their two respective states, as long as we only tested S,. 


However, the moment we tested a different observable, like S,, we disturbed that 
z-axis predictability. So the collapse of the system into the S, state was only valid as 
long as we continued to test that one observable. This is our next trait. 


176 


7.10.1 Trait #7 (Post-Measurement Collapse) 


If the measurement of an observable of system SY results in the eigenvalue, ax, then 
the system “collapses” into the (an) eigenvector |u,z), associated with ay. Further 
measurements on this collapsed state yields the eigenvalue a, with 100% certainty. 


Vocabulary Review. The eigenvectors of an observable are also known as eigen- 
kets. 


We've been saying all along that the eigenvalue a, might be degenerate. The im- 
plication here is that there may be more than one possibility for the collapsed state, 
specifically, any of the eigenvectors |uj,) , |uZ), |uZ’), ... which correspond to az. We 
won't encounter this situation immediately, but it will arise later in the course. The 
early easy cases will consist of non-degenerate eigenvalues whose probabilities are eas- 
ily computed by the amplitudes of their respective unique eigenvectors. Later, when 
we get to degenerate eigenvalues, we won’t be sure which of the eigenvectors — cor- 
responding to that eigenvalue — represents the state into which the system collapsed. 
Yet, even knowing that it collapsed to the small subset of eigenvectors corresponding 
to a single eigenvalue (in the degenerate case) is invaluable information that plays 
directly into our algorithms. 


Preparing Special System States 


The impact of Trait #7 is that engineers can prepare special states to act as input 
into our quantum hardware logic. This is akin to setting a register’s value in a classical 
computer using an assignment statement, prior to beginning further logic. 


Example: Preparing a Basis (Eigenvector) State. We did this in Exper- 
iment #1. After measuring the observable S,, we ended up with two groups. By 
selecting either one of those groups, we will be getting either the |+) state or the |—) 
state, as we wish. Any future testing of S, will confirm that we stay in those states, 
as long as we don’t subject the system to forces that modify it. 


Example: Preparing a Superposition State. We did this, too. In our late 
stages of tinkering with Experiment #2, we focused on the output of the second 
apparatus by choosing the |—),, group for further investigation. Using our state space 
vocabulary, we expand the state |—),, in terms of the z-eigenkets, 

ye ee 

: V2 
and realize that we have prepared a state which, with respect to the z-spin observable, 
is not a basis state but a linear combination — a superposition — of the two z-basis 
states (with equally weighted components). This kind of state preparation will be 
very important for quantum algorithms, because it represents starting out in a state 
which is neither 0 nor 1, but a combination of the the two. This allows us to work 
with a single state in our quantum processor and get two results for the price of one. 
A single qubit and a single machine cycle will simultaneously produce answers for 
both 0 and 1. 


verg 


But, after we have prepared one of these states, how do we go about giving it to 
a quantum processor, and what zs a quantum processor. That’s answered in the first 
quantum computing lesson coming up any day now. Today, we carry on with pure 
quantum mechanics to acquire the full quiver of q-darts. 


7.11 Summary of What Quantum Mechanics Tells 
Us About a System 


We now have a complete set of numbers and mathematical entities that give us all 
there is to know about any quantum system. We know 


1. the possible states |W) of the system (vectors in our state space), 


2. the possible outcomes of a measurement of an observable of that system (the 
eigenvalues, ax, of the observable), 


3. the eigenvectors states (a.k.a. eigenbasis), |ux), into which the system collapses 
after we detect a value, az, of that observable, and 


4. the probabilities associated with each eigenvalue outcome (specifically, |c,|’, 
which are derived from the amplitudes, cz, of |) expanded along the eigenbasis 


[ux))- 


In words, the system is in a state of probabilities. We can only get certain special 
outcomes of measurement. Once we measure, the system collapses into a special state 
associated with that special outcome. The probability that this measurement occurs 
is predicted by the coefficients of the state’s eigenvector expansion. 


7.12 Dirac’s Bra-ket Notation 


We take a short break from physics to talk about the universal notation made popular 
by the famous physicist Paul Dirac. 


7.12.1 Kets and Bras 


It’s time to formalize something we’ve been using loosely up to this point: the bracket, 
or bra-ket, notation. The physicists’ expression for a complex inner product, 


(viw), (nie), (uelv), (+l-); 


can be viewed, not as a combination of two vectors in the same vector space, but 
rather as two vectors, each from different vector space. Take, for example, 


(n| wv), 


178 


The RHS of the inner product, |), is the familiar vector in our state space, or ket 
space. Nothing new there. But the LHS, (n]|, is to be thought of as a vector from a 
new vector space, called the bra-space (mathematicians call it the dual space). The 
bra space is constructed by taking conjugate transpose of the vectors in the ket space, 
that is, 


a * * 
w= (G) > wl = (6). 
Meanwhile, the scalars for the bra space are the same: the complex numbers, C. 


Examples 


Here are some kets (not necessarily normalized) and their associated bras. 


lv) = oo + (l= (1-4, vi+2) 


os ee = Bi 


I 
aN 


V3/2, i) 


—t 


1 (5 
) = J5(3) ee oe 


Ss 
| 
rs 
<=> 
= 
l 
ea 
e 
ie, 


me 


[Exercise. Show that 
Sp eS ASE) Sta. 


Hint: It’s probably easier to do this without reading a hint, but if you’re stuck 

. write out the LHS as a single column vector and take the conjugate transpose. 
Meanwhile the RHS can be constructed by constructing the bras for the two z-basis 
vectors (again using coordinates) and combining them. The two efforts should result 
in the same vector. | 


Vocabulary and Notation 


Notice that bras are written as row-vectors, which is why we call them conjugate 
transposes of kets. The dagger (Tf) is used to express the fact that a ket and bra bear 


ilo) 


this conjugate transpose relationship, 


(| lw)" and 
vy = Wir. 


This should sound very familiar. Where have we seen conjugate transpose before? 
Answer: When we defined the adjoint of a matrix. We even used the same dagger 
(+) notation. In fact, you saw an example in which the matrix had only one row (or 
one column) — i.e., a vector. (See the lesson on linear transformations.) This is the 
same operation: conjugate transpose. 


Be careful not say that a ket is the complex conjugate of a bra. Complex conjuga- 
tion is used for scalars, only. Again, we say that a bra is the adjoint of the ket (and 
vice versa). If you want to be literal, you can always say conjugate transpose. An 
alternate term physicists like to use is Hermitian conjugate, “the bra is the Hermitian 
conjugate of the ket.” 


Example 


Let’s demonstrate that the sum of two bras is also a bra. What does that even mean? 
If (w| and (n| are bras, they must be the adjoints of two kets, 


Ww) = (G) wl = (at). 


n= (F) al = 8). 
We add the bras component-wise, 


Ia 2) a ois A eC 


which you can see is the Hermitian conjugate of the ket 


in) + Wy = (842). 


1;€:; 


(nl + Wl = (ln) + Wy): 


That’s all a bra needs to be: the Hermitian conjugate of some ket. So the sum is a 
bra. 


There is (at least) one thing we must confirm. As always, when we define anything 
in terms of coordinates, we need to be sure that the definition is independent of our 
choice of basis (since coordinates arise from some basis). I won’t prove this, but you 
may choose to do so as an exercise. 


180 


[Exercise. Pick any three axioms of a vector space and prove that the bras in 
the bra space obey them.] 


[Exercise. Show that the definition of bra space is independent of basis. ] 


Remain Calm. There is no cause for alarm. Bra space is simply a device that 
allows us manipulate the equations without making mistakes. It gives us the ability to 
talk about the LHS and the RHS of an inner product individually and symmetrically, 
unattached to the inner product. 


Elaborate Example 


We will use the bra notation to compute the inner product of the two somewhat 
complicated kets, 


c|b) +|n) on the left, and 


d\e) — £1) 
g 


on the right , 


where c, d, f and g are some complex scalars. The idea is very simple. We first take 
the Hermitian conjugate of the intended left vector by 


1. turning all of the kets (in that left vector) into bras, 


2. taking the complex conjugate of any scalars (in that left vector) which are 
outside a ket, 


3. forming the desired inner product, and 


4. using the distributive property to combine the component kets and bras. 


Applying steps 1 and 2 on the left ket produces the bra 


ce (pl + (nl - 
Step 3 gives us 
(owl +c) (FO) | 
and applying step 4 we get 


Cdib|o) + dinle) — FW) + fin 
g 


It seems overly complicated, but usually we apply it to simpler combinations and it 
is far less cumbersome than turning all the constituent kets into their coordinates, 
combining them, taking the conjugates of the left ket and doing the final “dot.” 


ical 


Simple Example 


ely = (( PSE) a) 


V2 Vo 

The first thing we did was to express |+) , in the z-basis without converting it to a 
bra. Then, we used the techniques just presented to convert that larger expression 
into a bra. From there, it was a matter of distributing the individual kets and bras 
and letting them neutralize each other. Normally, we would perform the first two 
steps at once, as the next example demonstrates. 


_ CH] + 4] foie Mea 
pee a en | 
Ee eet tel) OS ey 


ret g 
2 


[Exercise. Compute ae | +) and ae | +) 


Summary 


The bra space is a different vector space from the ket (our state) space. It is, however, 
an exact copy (an isomorphism) of the state space in the case of finite dimensions 
- all we ever use in quantum computing. You now have enough chops to prove this 
easily, so I leave it as an ... 


[Exercise. Prove that the adjoint of the ket basis is a basis for bra space. Hint: 
Start with any bra. Find the ket from which it came (this step is not always possible in 
infinite dimensional Hilbert space, as your physics instructors will tell you). Expand 
that ket in any state space basis, then... || 


7.12.2 The Adjoint of an Operator 


We now leverage the earlier definition of a matrix adjoint and extend our ability to 
translate expressions from the ket space to the bra space (and back). If A is an 
operator (or matrix) in our state space (not necessarily Hermitian — it could be any 
operator), then its adjoint AT can be viewed as an operator (or matrix) in the bra 
space. 


Als (b| + (9. 


182 


At operates on bras, but since bras are row-vectors, it has to operate on the right, 
not left: 


(pAT > @. 


And the symmetry we would like to see is that the “output” (¢| to which A’ maps 
(w| is the bra corresponding to the ket A|w). That dizzying sentence translated into 
symbols is 


(@) <> (Ad). 


Example. Start with our familiar 2-dimensional state space and consider the 
operator, 


Its adjoint is 


At maps the bra, (w| = (1+, 3) into 
(y| At = a+, (| i) = (1-427, 2). 


Meanwhile, back in ket space, A maps |w) = (w|' into 


am = (3) (3°) = G2). 


As you can see by comparing the RHS of both calculations, the adjoint of A|w) is 
(w| AT, in agreement with the claim. 


7.12.3 The Adjoint Conversion Rules 


The reason we added adjoint operators into the mix was to supply the final key to 
the processes of converting any combination of state space kets into bras and any 
combination of bras into kets. Say we desire to convert the ket expression 


cAl~) + d(+|n) In) 


into its bra counterpart. The rules will guide us, and they work for expressions far 
more complex with equal ease. 


We state the rules, then we will try them out on this expression. This calls for a 
new trait. 


183 


7.12.4 Trait #8 (Adjoint Conversion Rules) 


e The terms of a sum can be (but don’t have to be) left in the same order. 
e The order of factors in a product are reversed. 

e Kets are converted to bras (i.e., take the adjoints). 

e Bras are converted to kets (i.e., take the adjoints). 

e Operators are converted to their adjoints. 

e Scalars are converted to their complex conjugates. 


e (Covered by the above, but stated separately anyway:) Inner products are re- 
versed. 


e When done (for readability only), rearrange each product so that the scalars are 
on the left of the vectors. 


If we apply the adjoint conversion rules, except for the readability step, to the 
above combination we get 


; 
(cAld) + d(+|n) In) ) = WlAte + i nit)’, 
which we rearrange to 
ct IAT + (a+) nl. 


You'll get fast at this with practice. I don’t want to spend any more real estate on 
the topic, since we don’t apply it very much in our first course, CS 83A, but here are 
a couple exercises that will take care of any lingering urges. 

(Exercise. Use the rules to convert the resulting bra of the last example back 
into a ket. Confirm that you get the ket we started with.| 


[Exercise. Create a wild ket expression consisting of actual literal scalars, matri- 
ces and column vectors. Use the rules to convert it to a bra. Then use the same rules 
to convert the bra back to a ket. Confirm that you get the ket you started with.| 


7.13 Expectation Values 


Say we have modeled some physical quantum system, -%, with a Hilbert space, H. 
Imagine, further, that we want to study some observable, A, that has (all non- 
degenerate) eigenvalues {a;, } with corresponding eigenkets { |u;) }. Most importantly, 
we assume that we can prepare many identical copies of .Y, all in the same state |). 


184 


(We did this very thing by selecting only z-up electrons in a Stern-Gerlach-like ap- 
paratus, for example.) We now look at our state expanded along the A eigenbasis, 


by = doce len). 


How do the amplitudes, c,, and their corresponding probabilities, cel”, make them- 
selves felt, by us human experimenters? 


The answer to this question starts by taking many repeated measurements of the 
observable A on these many identical states |y~) and recording our results. 


[Exercise. Explain why we can’t get the same results by repeating the A mea- 
surements on a single system .Y in state |w).] 


7.13.1 The Mean of the Experiments 


We'll take a large number, N, of measurements, record them, and start doing ele- 
mentary statistics on the results. We label the measurements we get using 


jth measurement of A = ™,;. 


As a start, we compute average (or mean) of all N measurements, 


l N 
j=l 


If we take a large enough N, what do we expect this average to be? This answer 
comes from the statistical axiom called the law of large numbers, which says that this 
value will approach the expectation value, , as N — ov, that is, 

liimm = LL. 

N-co 


This is good and wonderful, but I have not yet defined the expectation value js. Better 
do that, fast. 


[Note. I should really have labeled ™ with N, as in ™my, to indicate that each 
average depends on the number of measurements taken, as we are imagining that we 
can do the experiment with larger and larger N. But you understand this without 
the extra notation.| 


7.13.2 Defining Expectation Value 


This conveniently brings us back around to Trait #6 (the Fourth Postulate of QM) 
concerning probabilities of outcomes. It asserts that the probability of detecting the 


185 


measurement (eigenvalue) ax, is given by its kth expansion coefficient along the A 
basis (c,), specifically by its magnitude-squared, |c,|’, 


P (ax) yy = cc, = ||’. 


This motivates the definition. Expectation value, ju, is defined by summing up each 
possible outcome (the eigenvalues a,), weighted by their respective probabilities, |c;,|7: 


S— lee” Gk - 


k 


m 


In case you don’t see why this has the feeling of an expectation value (something we 
might expect from a typical measurement, if we were forced to place a bet), read it 
in English: 


The first measurable value times the probability of getting that value 
plus 
the second measurable value times the probability of getting that value 


plus 


In physics, rather than using the Greek letter jz, the notation for expectation value 
focuses attention on the observable we are measuring, A, 


(A) = Do leal? ax. 
k 


Fair Warning. In quantum mechanics, you will often see the expectation value of 
the observable, A, written without the subscript, |w), 


(A) , 


but this doesn’t technically make sense. There is no such thing as an expectation 
value for an observable that applies without some assumed state; you must know 
which |W) has been prepared prior to doing the experiment. If this is not obvious to 
you, look up at the definition one more time: We don’t have any c, to use in the 
formula unless there is a |W) in the room, because 


I) = Siew lus). 


When authors suppress the subscript state on the expectation value, it’s usually 
because the context strongly implies the state or the state is explicitly described 
earlier and applies “until further notice.” 


Calculating an expectation value tells us one way we can use the amplitudes. 
This, in turn acts as an approximation of the average ™, of a set of experimental 
measurements on multiple systems in the identical state. 


186 


7.13.3 Computing Expectation Value 


It seems like we should be done with this section. We have a formula for the expecta- 
tion value, (A)iay> so what else is there to do? It turns out that computing that sum 
isn’t always as easy or efficient as computing the value a different way. 


7.13.4 Trait #9 (Computing an Expectation Value) 


The expectation value of an observable A of a system in state |w) is usually computed 
using the expression (y|A|). 


To add formality, we will also call this the 


Expectation Value Theorem. 


(A) wy = (PlAly). 


The A on the RHS can be thought of either as the operator representing A, or the 
matrix for the operator. The expression can be organized in various ways, all of which 
result in the same real number. For instance, we can first compute A|w) and then 
take the inner product 


(| (Ale) ), 
or we can first apply A to the bra (w|, and then dot it with the ket, 


((1A) |e. 


If you do it this way, be careful not take the adjoint of A. Just because we apply 
an operator to a bra does not mean we have to take its Hermitian conjugate. The 
formula says to use A, not At, regardless of which vector we feed it. 


(Exercise. Prove that the two interpretations of (¢ |A|~) are equal by expressing 
everything in component form with respect to any basis. Hint: It’s just (a row vector) 
x (a matrix) x (a column vector), so multiply it out both ways.] 


Proof of the Expectation Value Theorem. This is actually one way to prove 
the last exercise nicely. Express everything in the A-basis (i.e., the eigenkets of A 
which Trait #4 tells us form a basis for the state space H). We already know that 


Cy 
C2 
ly) = Soce lm) = |e ; 

k 

nr] s-basis 
which means (by our adjoint conversion rules) 
(y| S° Ch (ug a (Cr C9, C3, tay Cie Wwais : 
k 


187 


Finally, what’s A in its own eigenbasis? We know that any basis vector expressed in 
that basis’ coordinates has a preferred basis look, (1, 0, 0, 0, ...)', (0, 1, 0, 0, ...)%, 
etc. To that, add the definition of an eigenvector and eigenvalue 


Mu = au, 


and you will conclude that, in its own eigenbasis, the matrix for A is 0 everywhere 
except along its diagonal, which holds the eigenvalues, 


a, 0 0 -::- 0 
0 a 0 -::- 0 

A = 0 0 azs --- O 
Dee HR et A-basis 


[Oops. I just gave away the answer to one of today’s exercises. Which one?] We now 
have all our players in coordinate form, so 


ay 0 0 0 C1 
0 a2 0 0 C2 
(Alm) = (iG Gyaisye ey | SO  & -@s 0] | ¢ 
0 O O -:: ay Cn 
ay Cy, 
a2 C2 
= (ci, C3, C3, ) Cc.) a3 C3 
An Cn 


which is the definition of (A)),). 


7.13.5 Expectation Values in Spin-1/2 Systems 


I'll do a few examples to demonstrate how this looks in our 2-dimensional world. 


The Expectation Value of S, Given the State |+) 


This is a great sanity check, since we know from Trait #7 (the fifth postulate of 
QM) that S, will always report a +4 with certainty if we start in that state. Let’s 


confirm it. 
afd 0 1 A 
wien = aot(l ®)() = 48 


188 


That was painless. Notice that this result is weaker than what we already knew 
from Trait #7. This is telling us that the average reading will approach the (+) 
eigenvalue in the long run, but in fact every measurement will be (+). 


[Exercise. Show that the expectation value (—|S.|—) = —4,] 


The Expectation Value of S, Given the State |—), 


Once again, we know the answer should be —4 because we’re starting in an eigenstate 
of S,, but we will do the computation in the z-basis, which involves a wee-bit more 
arithmetic and will serve to give us some extra practice. 


1 h/0 1 i 1 
h 0 1 1 
z0-0 (Ta) (4) 
h —1 h 
me hy (7) a 
[Exercise. Show that the expectation value (+|S.|+), = +4 | 


The Expectation Value of S, Given the State |—), 


This time, it’s not so obvious. However, we can guess. Since, the state 
|—) ae |+) = Oe 
7] J2 
we see that the probability for each outome is 1/2. Over time half will result in an 


S, measurement of +5. and half will give us —f so the average should be close to 0. 
Let’s verify that. 


1 a EL. 00) Lt 
ISI = ee 30 5) (4) 
h 1 0O 1 
ce gut) (; _) & 
h rome gk 
- fas ()) = 
[Exercise. Show that the expectation value (+|S,;/+) = 0.] 


y y 
7.14 It’s About Time 


We are done with all the math and physics necessary to do rigorous quantum comput- 
ing at the basic level. We’ll be adding a few lessons on math as the course progresses, 


189 


but for now you're ready to dive into the lectures on single qubit systems and early 
algorithms. 


The next chapter is a completion of our quantum mechanics primer that covers 
the basics of time evolution, that is, it describes the laws by which quantum systems 
evolve over time. It isn’t required for CS 83A, but you'll need it for the later courses 
in the sequence. You can skip it if you are so inclined or, if you “opt in” immediately, 
it’ll provide the final postulates and traits that comprise a complete study of quantum 
formalism including the all important Schrodinger equation. 


Whether you choose to go directly to the chapter on qubits or first learn the 
essentials of the time dependent Schrodinger equation you’re in for a treat. Both 
subjects provide a sense of purpose and completion to all the hard work we’ve done 
up to this point. 


Either way, it’s about time. 


190 


Chapter 8 


Time Dependent Quantum 
Mechanics 


8.1 Time Evolution in Quantum Computing 


Once a quantum system is put into a known state it will inevitably evolve over 
time. This might be a result of the natural laws of physics taking their toll on the 
undisturbed system or it may be that we are intentionally subjecting the system to 
a modifying force such as a quantum logic gate. In either case, the transformation is 
modeled by a linear operator that is unitary. 


We’ve seen unitary matrices already, and even if you skip this chapter, you’re 
equipped to go on to study qubits and quantum logic because the matrices and 
vectors involved do not have a time variable, t, in their respective coefficients - they 
are all complex constants. 


But there are unitary transformation and quantum state vectors in which the 
elements themselves are functions of time, and it is that kind of evolution we will 
study today. It could represent the noise inherent in a system or it may just be a 
predictable change that results from the kind of hardware we are using. 


8.2 The Hamiltonian 


8.2.1 Total Energy 


It turns out that the time evolution of a quantum system is completely determined by 
the total energy of the system (potential plus kinetic). There is a name for this quan- 
tity: The Hamiltonian. However the term is used differently in quantum mechanics 
than in classical mechanics, as we’ll see in a moment. 


191 


8.2.2 From Classical Hamiltonian to Quantum Hamiltonian 


We have to figure out a way to express the energy of a system Y using pen and paper 
so we can manipulate symbols, work problems and make predictions about how the 
system will look at 5 PM if we know how it started out at 8 AM. It sounds like a 
daunting task, but the 20th century physicists gave us a conceptually simple recipe 
for the process. The first step is to define a quantum operator — a matrix for our 
state space — that corresponds to the total energy. Well call this recipe ... 


Trait #10 (Constructing the Hamiltonian) 
To construct an operator, H, that represents the energy of a quantum system: 


1. Express the classical energy, 7, formulaically in terms of basic classical con- 
cepts (e.g., position, momentum, angular momentum, etc.). You will have # 
on the LHS of a formula and all the more basic terms on the RHS. 


2. Replace the occurrence of the classical variables on the RHS by their (well 
known) quantum operators, and replace the symbol for classical energy, #, 
on the LHS by its quantum symbol H. 


Vocabulary. Although the total energy, whether classical or quantum, is a scalar, 
the Hamiltonian in the quantum case is typically the operator associated with the 
(measurement of) that scalar, while the classical Hamiltonian continues to be syn- 
onymous with the scalar, itself. As you can see from the reading Trait #10, to 
distinguish the classical Hamiltonian, which works for macroscopic quantities, from 
the quantum Hamiltonian, we use # for classical and H for quantum). 


This is a bit hard to visualize until we do it. Also, it assumes we have already been 
told what operators are associated with the basic physical concepts on the RHS. As 
it happens, there are very few such basic concepts, and their quantum operators are 
well known. For example, 3-dimensional positional coordinates (x, y, z) correspond to 
three simple quantum operators X, Y and Z. Because I have not burdened you with 
the Hilbert space that models position and momentum, I can’t give you a meaningful 
and short example using those somewhat familiar concepts, but in a moment you'll 
see every detail in our spin-1/2 Hilbert space, which is all we really care about. 


8.3. The Hamiltonian for a Spin-1/2 System 


8.3.1 A Classical Hamiltonian 


Consider a stationary electron in a constant magnetic field. (Warning: this is not 
the Stern-Gerlach experiment since that requires a non-homogeneous magnetic-field, 
not to mention a moving electron.) Because the electron is stationary, all the energy 
is potential and depends only on the spin (direction and magnitude) of the electron 


192 


in relation to the magnetic field. (You may challenge that I forgot to account for 
the rotational kinetic energy, but an electron has no spatial extent, so there is no 
classical moment of inertia, and therefore no rotational kinetic energy.) We want to 
build a classical energy equation, so we treat spin as a classical 3-dimensional vector 
representing the intrinsic angular momentum, (Sz, Sy, Eee A dot product between 
this vector and the magnetic field vector, B, expresses this potential energy and yields 
the following classical Hamiltonian 


KH = —-YB-S, 


where y is a scalar known by the impressive name gyromagnetic ratio whose value is 
not relevant at the moment. (We are temporarily viewing the system as if it were 
classical in order to achieve step 1 in Trait #410, but please understand that it 
already has one foot in the quantum world simply by the inclusion of the scalar y. 
Since scalars don’t affect us, this apparent infraction doesn’t disturb the process.) 


Defining the z-Direction. The dot product only cares about the relationship 
between two vectors, 


v-w = vu cosé, 


where @ is the angle between them, so we can rotate the pair as a fixed assembly, that 
is, preserving angle 0. Therefore, let’s establish a magnetic field (with magnitude B) 
pointing in the +2-direction, 


and let the spin vector, S, go along for the rotational ride. This does not produce 
a unique direction for the spin, but we only care about the polar angle 6, which we 
have constrained to remain unchanged. Equivalently, we can define the z-direction 
to be wherever our B field points. Either way, we get a very neat simplification. 


The classical spin has well-defined real-valued components s,, s, and s, (not 
operators yet), 


and we use this vector to evaluate the dot product, above. Substituting gives 


0 Se 
w= =e Ol | sy 
B S, 

= -yBs,. 


This completes step 1 in Trait #10, and I can finally show you how easy it is to do 
step 2. 


193 


8.3.2 A Quantum Hamiltonian 


We saw that (by a century of experimentation) the quantum operator corresponding 
to the classical z-component of spin is S,. So we simply replace the classical s, with 
our now familiar quantum operator operator S, to get the quantum Hamiltonian, 


H= -v7BS,. 


It’s that simple. 


Note that while S, may change from moment-to-moment (we aren’t sure yet), B 
and y are constants, so the Hamiltonian is not time-dependent; it always gives an 
answer based solely on the relationship between S and B. 


We now have a scalar relation between the Hamiltonian operator, H, and the 
z-spin projection operator, S,, giving us a short-cut in our quest for H; we need only 
substitute the matrix we computed earlier for S, into this equation and get 


h (1 0 
moe 85 BAe 
8.4 The Energy Eigenkets 


8.4.1 Relationship Between H and S, 


Because the operator H is a scalar multiple of our well studied S,, we can easily find 
its eigenvectors (also known as the energy eigenkets) and eigenvalues; the eigenvectors 
are the same |+), and to get their respective eigenvalues we simply multiply those 
of S, by —yB. We’d better prove this, as it might not be obvious to everyone. We 
know that eigenvalue-eigenvector pairs for S, are given by 


sI-) = (-3)H. (ho -3). 


but we now know that S, and B bear the scalar relationship, 


1 
oe (-5) ue 


Substituting this into the S, eigenvector expressions gives 


(-5) Hi = (G)) ana 
Ban = Op. 


194 


H\t) = (-) ea .. ana 
aLy = (EP). 


But this says that H has the same two eigenvectors as Sz, only they are associated 
with different eigenvalues, 


Bh 
HH) «> Te, 
Bh 
eee ee ee 
2 
If we measure the energy of the system, we will get an if the electron collapses into 


the |+) state, pointing as close as quantum mechanics allows toward the (+z)-axis, 
the direction of the B-field. Meanwhile, if it collapses into the |—) state, we will 
get 478 ’ pointing as far as quantum mechanics allows from (+z)-axis, opposite the 


direction of the B-field. 


Does this make sense? 


Yes. When a magnetic dipole (which is what electron spin represents, discounting 
the gyromagnetic ratio y) is pointing in the direction of a magnetic field, energy is 
minimum. Imagine a compass needle pointing magnetic north. The potential energy 
of the compass-Earth system is at its menimum: It takes no energy to maintain that 
configuration. However, if the dipole is pointing opposite the direction of the magnetic 
field, energy is at its maximum: Imagine the compass needle pointing magnetic south. 
Now it takes energy to hold the needle in place. The potential energy of the compass- 
Earth system is maximum. The fact that the energy measurement is negative for the 
|+) but positive for the |—) faithfully represents physical reality. 


Also, note that for a spin-1/2 system, any state vector, |w), has the same coor- 
dinate expression whether we expand it along the eigenkets of S, or those of H, as 
they are the same two kets, |+) and |-—). 


8.4.2 Allowable Energies 
Knowledge of the energy eigenvectors is essential to predicting time evolution of any 
state, as we are about to see. But first, we apply Trait #3 (the third postulate 


of QM: allowable eigenvalues of an observable) to the Hamiltonian to make a trait- 
worthy observation that is universal in quantum physics. 


Trait #11 (Quantization of Energy) 


The only allowable (i.e., measurable) energies of a quantum system are the eigenvalues 
of the Hamiltonian. 


195 


Let’s take a short side-trip to give the crucial postulate that expresses, in full gen- 
erality, how any quantum state evolves based on the system’s Hamiltonian operator, 
HH, 


8.5 Sixth Postulate of Quantum Mechanics 


8.5.1 The Schrodinger Equation 


The time-dependent Schrodinger Equation is the quantum-mechanical answer to the 
question, “How does a state evolve over time?” To start, we replace a general (but 
fixed) state with one which is a function of time, 


ly) —> |v). 


As a consequence, its expansion coefficients will also be complex scalar functions of 
time. 


[o(t)) = DSc ee(é) |ue)- 


k=1 


Everything we did earlier still holds if we freeze time at some t’. We would then 
evaluate the system at that instant as if it were not time-dependent and we were 
working with the fixed state 


Io) = fh)) = Soe lee), 
k=1 
G = c(t’). 
To get to time t = t’ (or any future t > 0), though, we need to know the exact 


formula for those coefficients, c,(t), so we can plug-in t = t’ and produce this fixed 
state. That’s where the sizth postulate of quantum mechanics comes in. 


Trait #12 (The Time-Dependent Schrédinger Equation) 


The time evolution of a state vector is governed by the Schrodinger Equation 


ad 
hot) = H(t) |v)? . 
Notice that the Hamiltonian can change from moment-to-moment, although it does 


not always do so. For our purposes, it is not time dependent, so we get a simpler 
form of the Schrodinger Equation, 


a) 
tha |ot)) = HI v(t). 


196 


This is still a time-dependent equation; we merely have a simplification acknowledging 
that t does not appear in the matrix for H. 


The “Other” Schrodinger Equation? In case you’re wondering whether 
there’s a time-independent Schrodinger equation, the answer is, it depends on whom 
you ask. Purists say, not really, but most of us consider the eigenket-eigenvalue 
equation of Trait #3’, 


Ta |uz) = ax|ur), 


when applied to the operator A = H, to be the time-independent Schrodinger equa- 
tion. 


Solving the time-independent Schrodinger equation, which produces the energy 
eigenkets and eigenvalues, is typically the first step in the process of solving the more 
general time-dependent Schrodinger equation. 


8.5.2 The Evolution of Spin in a Constant Magnetic Field 
You are ready to solve your first Schrodinger equation. We consider the system of a 


stationary electron in a uniform z-up directed B-field, whose Hamiltonian we have 
already “cracked” (meaning we solved the time-independent Schrédinger equation). 


Because it’s so exciting, ’m going to summarize the set-up for you. The players 
are 


e our state vector, |~(t)), with its two expansion coefficients, the unknown func- 
tions c;(t) and c(t), 


Boy = 2s: 2 26 Se = als 


e the Hamiltonian operator, 


h (1 0 
= BS (4 La 


e and the Schrodinger equation that relates the two, 


aot 
the |ot)) = HI v(t). 


Let’s compute. Substitute the coordinate functions in for |w) in the Schrédinger 


equation, 
ged. f Gr) - a ole c(t) 
ar eS = (; (3) 


19 


This is equivalent to 


or 
d yBi 
Fr Cy = ss Ci and 
d yBi 
dt (o>) = a C2 
From calculus we know that the solutions to 
dx 
ae = RE 


with constant k, are the family of equations 
z(t) = Ce, 


one solution for each complex constant C’. (If you didn’t know that, you can verify it 
now by differentiating the last equation.) The constant C' is determined by the initial 
condition, at time t = 0, 


CG. ar a0}. 


We have two such differential equations, one for each constant 


yBi 
ee 
y) 9 
SO 
a(t) = GietRP). 
C2 (t) = Ce 808) a 
The initial condition is that 
|Y(0)) = |), 
our starting state. In other words, the initial conditions for our two equations are 
ci(0) = ag and 
c2(0) = Bo ’ 


where we are saying Qo and {g are the two scalar coefficients of the state |W) at time 
t = 0. That gives us the constants 


Ci = Qo; 


Cy = Bo ’ 


the complete formulas for the time-dependent coefficients, 
c(t) = ao et(yB/2) 


Gt =" Pye Orr, 
and, finally, our time-dependent state in its full glory, 
WE) = apet@P) 4) + By er#OB/™ |, 


We have solved the Schrodinger equation using first quarter calculus. 


If you can manage to not be too confused, let’s reduce our notation by using c, and 
C2 (without the parameter (t)), rather than ap and 9, to mean the initial coefficients 
at time t = 0 giving us the slightly cleaner 


WE) = eR 4) + oye HB /-y. 


We pause to consider how this can be generalized to any situation (in the case of 
finite or enumerable eigenvalues). I’ll introduce an odd notation that physicists have 
universally adopted, namely that the eigenket associated with eigenvalue a, will be 
that same a inside the ket symbol, i-e., |a) . 


1. We first solved the system’s Hamiltonian — the time-independent Schrodinger 
equation, if you like — to get the allowable energies, { E, } and their associated 
eigenkets, {|E;) }. 


2. Next, we expanded the initial state, |q), along the energy basis, 


[b) = Soen|Ee). 


3. We solved the Schrédinger equation for each time-dependent amplitude, c;,(t), 
which yielded 


ce(t) = ce tBe/h 


4. Finally, we “attached” the exponentials as factors to each of the terms in the 
original sum for |v) to get the full, time-dependent state, 


ee) = >) qe OP! |B,). 


This will be packaged into a trait in a minute. But first ... 


199 


8.5.3 Stationary States 


Notice what this implies if our initial state happens to be one of the energy eigenstates. 
In our spin-1/2 system that would be either |+) or |—). Take the |y) = |+). The 
result is 


ee) = MOP) 14), 


Introducing the shorthand ¢, = yBt/2, the time evolution merely causes a “phase 
factor” of e’ to appear. But remember that our state-space does not differentiate 
between scalar multiples of state vectors, so 


I) = ef |+) = |+) = |p). 


The state does not change over time. 


Looked at another way, if you start out in state |+) and measure total energy, H, 
you get 7% * with certainly (which we already knew, since the coefficient of |+) is 1, 
and 17 = 1). This also means that the state into which |+) evolves over time t, also 


has 100% probability of remaining |+) as the coefficient of the |+) reveals: 


exter = seule cit). eS ce OF) BORIS ve eI 


Its one and only expansion coefficient changes by a factor of e’?* whose square mag- 
nitude is 1 regardless of t. This is big enough to call a trait. 


Trait #13 (Stationary States) 
An eigenstate of the Hamiltonian operator evolves in such a way that its measurement 


outcome does not change; it remains in the same eigenstate. 


For this reason, eigenstates are often called stationary states of the system. 


8.5.4 General (Non-Stationary) States 


Does this mean that we can throw away the phase factors e’* for times t 4 0? It 
may surprise you that the answer is a big fat no. Let’s take the example of an initial 
state that is not an energy eigenket. One of the y-eigenstates will suffice, 
i) = |-) = [but ame aa 

7] J2 


The two initial coefficients are 


cq = and 


S| cle 


200 


Allow this state to evolve for a time ¢t according to the Schrédinger equation, 
etOB/2) |p). = ge HO R/2) |) 
V2 


The state is unchanged if we multiply by an overall scalar multiple, so let’s turn 
the |+) coefficient real by multiplying the entire fraction by its complex conjugate 
e 8/2) viving the equivalent state 


Wt) = 


wo) = Pate 
V2 ’ 
whose coefficients are 
(t) — and 
Cc — an 
' V2 
je t7B) 
c(t) = —-——— 


Comparing this to the original state, we see that while the coefficient of |+) never 
changes, the coefficient of |—) changes depending on t. To dramatize this, consider 
time t! = 1/(yB) (an impossibly tiny fraction of a second if you were to compute it 
using real values for y and a typical B-field). 

Wis ee. a weuS 

V2 

is? aca ie) 


a 


We started out in state |—),,, and a fraction of a second later found ourselves in state 


|+) y That’s a pretty dramatic change. If we were to measure S, initially, we would, 
with certainty, receive a -h/2 reading. Wait a mere 7/(yB) seconds later, and a 
measurement would result, with equal certainty, in the opposite value +h/2. 


8.5.5 General Technique for Computing Time-Evolved States 


We’ ve set the stage for a technique that applies throughout quantum mechanics. We'll 
call it a trait. 


Trait #14 (Evolution of Any Observable) 


To determine the time-evolved probability of the outcome of any observable, A, start- 
ing in the initial state |W), 


1. compute the Energy eigenvalues and eigenkets for the system, { Ex © |Ex)} , 
by solving the “time-independent” Schrédinger equation, H|F;,) = Ex |Ex), 


201 


2. expand |) along the energy basis: |) = >, cy|En), 


3. attach (as factors) the time-dependent phase factors, e~"®*/" to each term, 
WO) = Ly cne Pe!" Ep), 

4. “dot” this expression with the desired eigenket, |u;) of A, to find its amplitude, 
aj(t) = (uj|v(d)), and 


5. the magnitude-squared of this amplitude, |a,;(t)|°, will be the probability of an 
A-measurement producing the eigenvalue a, at time t. 


2 


The short version is that for any observable of interest, A, you first solve the sys- 
tem’s time-independent Schrodinger equation and use its energy eigenbasis to express 
your state, |q). Incorporate the time dependence into that expression and “dot” that 
with the desired eigenstate of A to get your amplitude and, ultimately, probability. 


(I keep using quotes with the verb “dot”, because this is really an inner-product, 
rather than a real dot-product, requiring the left vector’s coordinates to be conju- 
gated.) 


Example 


We continue to examine the evolution of an electron that starts in the y-down state, 
|—),- We've already done the first three steps of Trait ##14 and found that after 
time ¢ the state |—), evolves to 


_ 1) = tet) |-) 
yy = a 


(I used | (—);),,, rather than the somewhat confusing |—(¢)),,, to designate the state’s 
dependence on time.) 


That’s the official answer to the question “how does |—), evolve?” but to see how 
we would use this information, we have to pick an observable we are curious about 
and apply Trait #414, steps 4 and 5. 

Let’s ask about S,, the y-projection of spin — specifically the probability of mea- 
suring a |+), at time t. Step 4 says to “dot” the time-evolved state with the vector 
|+),, so the amplitude (step 4) is 


Gig Sg i) gs 


Pll help you read it: the “left” vector of the inner product is the +4 eigenket of the 
operator S,, |+) ap independent of time. The “right” vector of the inner product is 
our starting state, |—),,, but evolved to a later time, t. 


Because everything is expressed in terms of the z-basis, we have to be sure we 
stay in that realm. The z-coordinates of |+), are obtained from our familiar 


oe sees) 1/V2 
, v2 i/V2) 


202 


I added the subscript z on the RHS to emphasize that we are displaying the vector 
|+) , in the z-coordinates, as usual. If we are to use this on the left side of a complex 
inner product we have to take the conjugate of all components. This is easy to see in 
the coordinate form, 


but let’s see how we can avoid looking inside the vector by applying our adjoint 
conversion rules to the expression defining |+), to create a bra for this vector. I'll 
give you the result, and you can supply the (very few) details as an ... 


[Exercise. Show that 


+l = (,) = AS Ae 


Getting back to the computation of the amplitude, c,,, substitute the computed 
values into the inner product to get 


(Ssh (Ee a) 


(ee = ——) _ 1 a 


C+y 


(The last two equalities made use of the orthonormality of any observable eigenbasis 
(Trait #4).) 


Finally, step 5 says that the probability of measuring S, = +4 at any time t is 


, : qi eiyBt ie eyBt 
ley| = CyyChy = 9 9 ) 


where we used the fact (see the complex number lecture) that, for real 0, 


‘Cae = e? : 


Let’s simplify the notation by setting 
6 = Bt 


and complete the computation with that substitution, 


5 ese e9 — ei 
[Cpl — 4 
7 1 1 ef +e 
> ae = 5 ( 2 ) 
1 
=e = ~ cos@ 


203 


Undoing the substitution gives us the final result 


P( si = +5] ee E eelar 
2 2 

As you can see, the probability of measuring an up-y state oscillates between 0 and 
1 sinusoidally over time. Note that this is consistent with our initial state at time 
t = 0: cos0 = 1, so the probability of measuring +4 is zero; it had to be since we 
started in state |—),,, and when you are in a eigenstate (|—),,), the measurement of the 
observable corresponding to that eigenstate (S,) is guaranteed to be the eigenstate’s 
eigenvalue (—4). Likewise, if we test precisely at t = 7/(7B), we get 4—4(-1) = 1, 
a certainty that we will detect +4, the |+),, eigenvalue. 

We can stop the clock at times between those two extremes to get any probability 
we like. 


[Exercise. What is the probability of measuring S,(t) = +4 at the (chronologi- 
cally ordered) times 


(a) t= 1/(678), 
(b) t=1/(47B), 
(c) t= 1/(37B), 
(d) t=a/(2yB). | 


[Exercise. Do the same analysis to get the probability that 5S, measured at time 
t will be —4. Confirm that at any time t, the two probabilities add to 1] 


8.6 Larmor Precession 


We complete this lecture by combining expectation value with time evolution to get 
a famous result which tells us how we can relate the 3-dimensional real vector of 
a classical angular momentum vector to the quantum spin-1/2 electron, which is a 
2-dimensional complex vector. 


8.6.1 The Time-Evolved Spin State in a Uniform B-Field 


We assume that we have many electrons in the same initial spin state, and we look 
at that state’s expansion along the z-basis, 


Ip) = Gilt) + l=), 
where, by normalization we know that 


Jey? + |c9|? = 1. 


204 


We let the state (of any one of these systems, since they are all the same) evolve for a 
time t. We have already solved the Schrodinger equation and found that the evolved 
state at that time will be 


|W(t)) 


. cit(yB/2) |+) rae oe t(7B/2) |—) 


C1 et(7B/2) 
This, then is the state at time t, prior to measurement. 


Rewriting |(t)) 


It will help to represent this state by an equivalent vector that is a mere unit scalar 
multiple of itself. To do that, we first express c, and c2 in polar form, 


Cc = «cel? and 
GQ = sel? 


giving the equivalent state 
cidt eit(yB/2) 
M(t) = 5 cid? e-it(yB/2) J 


(dit 
Then we multiply by the unit scalar e(3") to get a more balanced equivalent 
state, 


ceils) eitrB/2) 


sevil ta) e-it(7B/2) 


Now, we simplify by making the substitutions 

ee, = BR and 

do = G1- $2; 
to get the simple and balanced Hilbert space representative of our state, 

ceito/2 pitw/2 cei (tw + b0)/2 
WO) = (. e-ido/2 oa) ( eee) 
We get a nice simplification by using the notation 
wt + do 


g(t) = i 


to express our evolving state very concisely as 


cei b(t) 
p(t)) a ( a) : 


A Convenient Angle 
There is one last observation before we start to compute. Since |w) is normalized, 
+ [s|? aa ik 


the amplitudes c and s have moduli (absolute values) that are consistent with the 
sine and cosine of some angle. Furthermore, we can name that angle anything we 
like. Call it “0/2” for reasons that will become clear in about 60 seconds. [Start of 
60 seconds. | 


We have proclaimed the angle @ to be such that 


6 
c = cos~_ and 

2 
ane 

s = sin-, 
2 

which is why I named the amplitudes c and s. 

Also, we’re going to run into the two expressions cs and c? — s? a little later, 


so let’s see if we can write those in terms of our angle 0. The addition law of sines 
implies that 


; sin 0 
= _— 1n — = 
cs cos 5 S 5 5 
while the addition law of cosines yields 
2 2 2 2 
Cc o-— s* = cos'= — sin*~ = cosé. 
2 2 


By letting 0/2 be the common angle that we used to represent c and s (instead of, 
say, 0) we ended up with plain old 6 on the RHS of these formulas, which is the form 
we'll need. [End of 60 seconds.| 


Although we’ll start out using c and s for the moduli of |w)’s amplitudes, we’ll 
eventually want to make these substitutions when the time comes. The angle 6 will 
have a geometric significance. 


8.6.2 Evolution of the Spin Expectation Values 
Experiment #4: Measuring |7)(t)) at Time t 


We ask the physicists to measure one of S,, S, or S, at time t. Now they can 
only measure one of those observables per electron, because once they do, |w(t)) will 
collapse into one of the six basis states, |+), |+),, or |-),,, after which time, it becomes 
useless. But that’s okay; we’re swimming in |y(t)) electrons. We pick a very large 
number, NV, (say N is a million). We test 3N electrons, measuring S, on the first 
N, S; on the second N, and S, on the third N. We will have measured the state 


206 


|W(t)) 3N or 3 million times. The physicists record the measurements producing a 
certain number of +z values, a certain number of —z, etc. We ask them to compute 
the average of the S, results — a number between -4 and +4, and the same with the 
S, and S,results. 


So they'll have three numbers in the end, m™,, mM, and ™z. 


But hold on a second. We don’t have to bother the physicists, because when N 
is large, we know from the the law of large numbers that the average values of each 
of the three spin projections are approximated very closely by the expectation values 
of the operators. So, let’s compute those instead of wasting a lot of time and money. 
The Expectation Values at Time t 
We do this for each observable S,, S, and S,,, individually, then find a way to combine 
the answers. We’ll begin with (S,(t)). 

A. The Expectation Value for the z-spin Observable: (S,(t)) 
Trait #9 (the expectation value theorem), tells us that we can compute this using 
(b(t) | Sz | v(t)) - 


With the help of Trait #8 (the adjoint conversion rules) and keeping in mind that 
cand s are real, we find 


We can draw some quick conclusions from this (and subtler ones later). 


1. The expectation value of S, does not change with time. 


2. If |w) = |+), ie., c= 1 and s = 0, the expectation value is +5 as it must, since 
|+) is a stationary state and so always yields a (+) measurement. 


3. If |W) = |-), ie., c= 0 and s = 1, the expectation value is 4 again consistent 
with |—) being the other stationary state, the one that always yields a (—) 
measurement. 


207 


B. The Expectation Value for the x-spin Observable: (S,(€)) 


We compute 


(W(t) | Sz | d(4) - 


Using our adjoint conversion rules again, we find 


(W(t) | Sz | o(t)) 


+ (t) 
= (cei, sev) ea fe 
210) se tol) 
. ih [sete 
—i¢(t) to(t)) 
(ce , seh) 5 a 


= Le mis 0 en eae) ; 
2 


This is nice, but a slight rearrangement should give you a brilliant idea, 
eH) 4 prid(t) 
(W(t) |S2|v(t)) = csh 5 


Look back at our lesson on complex numbers, especially the consequences of the Euler 
formula, and you'll discover that the fraction simplifies to cos (2(t)). Now we have 


(Y(t) [Se] v(t) = esh cos (2¢(t)), 


which, after undoing our substitutions for cs and ¢(t) we set up in the convenient 
angle section, looks like 


(w(t) | Se] v(t)) = csh cos(wt+¢o9) = 7 sind cos (wt + go). 


Observe these consequences. 


1. The expectation value of 5, varies sinusoidally with time at a frequency w = yB, 
not counting the situation in item 2. 


2. If |W) is either |+) or |—), then cs = 0 producing an expectation value of 0 at 
all times. This is consistent with the following two facts. 


e These two z-eigenkets, when expressed in the x-basis, have equal “doses” 
(Zs) of |+),, and |—),, , as you can tell from 


aS 


so we would expect a roughly equal collapse into the |+),, and |—),, states, 
averaging to 0. 


e We've already established that the two kets |+) are stationary states of H, 
so whatever holds at time t = 0, holds for all time. 


208 


C. The Expectation Value for the y-spin Observable: (S,(t)) 
We compute 


(H(t) | Sy] o(t)) 


The calculation proceeds much like that of (S,,). 


(W(t) | Sy | Y@)) 


_j 6(t) 
(ce), ge# 90) ae ox, 
2\i 0 sete) 


Rearranging and applying one of our Euler formulas, we find 


eno) e2i alt) 
csh 


(b(t) | Sy | b(t)) 2% 
= -csh sin (2¢(t)) 


h 
= -csh sin(wt+2¢)) = 7 sin@ sin (wt + do). 
Again, some observations. 


1. The expectation value of S, varies sinusoidally with time at a frequency w = yB, 
not counting the situation in item 2. 


2. If |w) is either |+) or |—), then cs = 0 yielding an expectation value of 0 for 
all time. Again, this is consistent with these two z-eigenkets, being stationary 
states and having y-basis amplitudes of equal magnitude (<3): 


+) = pe eae a and 


= oe ; 
(from a prior exercise). 


The Expectation Vector and Larmor Precession 


In classical mechanics, Larmor precession describes the way in which a magnetic 
dipole’s axis — a real 3-dimensional vector — revolves about a magnetic field vector. 


209 


In contrast, the quantum mechanical spin state-vector lives in a 2-dimensional Hilbert 
space, not 3-dimensional real space, so we don’t have simultaneously measurable x, y, 
and z-components which we can study. However, we can define a real (and evolving) 
3-dimensional vector s(t) to be 


This s(t) is a true 3-dimensional (time-dependent) vector whose real coordinates are 
the three expectation values, (5;,), (S,) and (S,), at time, t. 


In the previous section we showed that 


4 sin 6 cos (wt + 0) sin 6 cos (wt + do) 
Sie = —4 sin @ sin (wt + ¢0) = : —sin@ sin (wt + do) 
A cos 6 cos 6 


If this is not speaking to you, drop the factor of h/2 and set $(t) = wt + ¢o. What 
we get is the 3-dimensional vector 
sin @ cos ¢(t) 
s(t) « — sin 0 sin ¢(t) 


cos 0 


It is a unit vector in R? whose spherical coordinates are (1, 0, o(t)), i.e., it has a polar 
angle @ and azimuthal angle ¢(t). (I don’t use exclamation points, but if I did, I would 
use one here.) We are looking at a vector that has a fixed z-coordinate, but whose 
x and y-coordinates are in a clockwise circular orbit around the origin (clockwise 
because of the y-coordinate’s minus sign — [Exercise]). This is called precession. 
Since our B-field was defined to point in the +z direction, we have discovered the 
meaning of the vector s(t) = ((S,(t)), (S,(t)) , (S.(t)) J’. 


This is something that recurs with regularity in quantum physics. The quan- 
tum state vectors, themselves, do not behave like classical vectors. However, their 
expectation values do. 


8.6.3 Summary of Larmor Precession 


When a motionless electron is initially in spin state 
wy = (°. 
- set2 J 


210 


within a uniform magnetic field B pointing in the +z direction, the Schrodinger equa- 
tion tells us that its expectation value vector, s(t), evolves according to the formula 


sin 8 cos (wt + ¢o) 
7 : —sin@ sin (wt + ¢o) 


cos 0 


0, w and ¢p are parameters related to |w) and B as follows: 


e @ is the polar angle that the real vector, s(t), makes with the z-axis in our R®. 
It relates to |W) in that 0/2 is the angle that expresses the magnitudes c and s 
of |w)’s Hilbert-space coordinates, 


” 
= cos d 
Cc COs 5) an 
0 
Ss = sin-—. 
2 


0/2 ranges from 0 (when it defines the up-z state, |+)) to 7/2 (when it defines 
the down-z state, |—) state), allowing 6 to range from 0 to 7 in R°. 


e w is the Larmor frequency, defined by the magnitude of the B-field and the 
constant, y (the gyromagnetic ratio), 


w = YB. 


e @o is the relative phase of the two amplitudes, 


go = 2. 


8.7 The End and the Beginning 


This completes our tutorial on both time-independent and time-evolving quantum 
mechanics. If you studied this last lesson as part of the required reading for CS 83B, 
you are ready to move on to the next phase in that course. If you chose to read it as 
an optional section in CS 83A, good for you. It’s time to start learning about qubits. 


Zl 


Chapter 9 


The Qubit 


Iv) = a0) + BI) 


9.1 Bits and Qubits as Vector Spaces 


All the hard work is done. You have mastered the math, spin-physics and computa- 
tional quantum mechanics needed to begin “doing” quantum computer science. 


While useful to our understanding of quantum mechanics, the spin-1/2 physical 
system, .%, is no longer useful to us as computer scientists. We will work entirely 
inside its state space H from this point forward, applying the postulates and traits 
of quantum mechanics directly to that abstract system, indifferent to the particular 
SY that the Hilbert space H is modeling. 


Although in practical terms a quantum bit, or qubit, is a variable superposition of 
two basis states in H of the form shown at the top of this chapter, formally qubits — 
as well as classical bits — are actually vector spaces. You read that correctly. A single 
qubit is not a vector, but an entire vector space. We’ll see how that works shortly. 


To get oriented, we’ll do a quick review of classical bits in the new formalism, then 
we'll retrace our steps for qubits. Let’s jump in. 


9.2 Classical Computation Models — Informal Ap- 
proach 


9.2.1 Informal Definition of Bits and Gates 


In order to study the qubzt, let’s establish a linguistic foundation by defining the more 
familiar classical bit. 


212 


Folksy Definition of a Bit 
Here’s one possible way to define a bit without “going formal.” 


A Bit (Informal). A “bit” is an entity, x, capable of being in one of 
two states which we label “O” and “1.” 


The main take-away is that 0 and 1 are not bits. They are the values or states that 
the bit can attain. 
We can use the notation 
= 0 


to mean that “zx is in the state 0,” an observation (or possibly a question) about the 
state bit x is in. We also use the same notation to express the imperative, “put x into 
the state 0.” The latter is the programmer’s assignment statement. 


Folksy Definition of a Logical Operator a.k.a. Gate 


What about the logical operators like AND or XOR which transform bits to other 
bits? We can define those, too, using similarly loose language. 


Classical Logical (Boolean) Operator. A logical operator is a func- 
tion that takes one or more bits as input and produces a single bit as 
output. 


Logical operators are also called logic gates — or just gates — when implemented in 
circuits diagrams. 


Note: We are only considering functions that have a single output bit. If one 
wanted to build a logic gate with multiple output bits it could be done by combining 
several single-output logic gates, one for each output bit. 


Examples of Logical Operators 


The Negation Operator. NOT (symbol —) is a logical operator on a single bit 
defined by the formula 


0, if x=1 and 
a2 = 
1, otherwise 


A common alternate notation for the NOT operator is the “overline,” % = 772. 


The Exclusive-Or Operator. XOR (symbol @) is an operator on two bits 
defined by the formula 


0: ob Sy anid 
rOBy = 
1, otherwise 


213 


Mathematically, x @ y is called “the mod-2 sum of x and y,” language that is used 
throughout classical and quantum logic. 


Unary and Binary Operators 


While logic gates can take any number of inputs, those that have one or two inputs 
are given special names. 


A unary operator is a gate that takes single input bit, and a binary 
operator is one that takes two input bits. 


As you can see, NOT is a unary operator while XOR is a binary operator. 


Truth Tables 


Informally, unary and binary operators are often described using truth tables. Here 
are the traditional gate symbols and truth tables for NOT and XOR. 


[ns x ax (or 7) 
0 1 


ce 8 

8 

® 

Kod 

re ee =) S 
Re jan) 

e j=) 


We could go on like this to define more operators and their corresponding logic gates. 
However, it’s a little disorganized and will not help you jump to a qubit, so let’s try 
something a little more formal. 


Our short formal development of classical logic in this lesson will be restricted to 
the study of unary operators. We are learning about a single qubit today which is 
analogous to one classical bit and operators that act on only one classical bit, i.e., 
unary operators. It sounds a little boring, I know, but that’s because in classical logic, 
unary operators are boring (there are only four). But as you are about to see, in the 
quantum world there are infinitely many different unary operators, all useful. 


214 


9.3. Classical Computation Models — Formal Ap- 
proach 


The definition of a quantum bit is necessarily abstract, and if I were to define it at 
this point, you might not recognize the relationship between it and a classical bit. 
To be fair to qubits, we’ll give a formal definition of a classical bit first, using the 
same language we will need in the quantum case. This will allow us to establish some 
vocabulary in the classical context that will be re-usable in the quantum world and 
give us a reference for comparing the two regimes on a level playing field. 


9.3.1 A Miniature Vector Space 


You are familiar with R?, the vector space of ordered pairs of real numbers. I’d like 
to introduce you to an infinitely simpler vector space, B = B?. 


The Tiny “Field” B 


In place of the field R of real numbers, we define a new field of numbers, 


B = {0,1}. 


That’s right, just the set containing two numbers. But we allow the set to have 
addition, ©, and multiplication, -, defined by 


® is addition mod-2 


061 = 1 
160 = 1 
OO SoU 
1@1 = 0 
- is ordinary multiplication 
Oe = 0 
Bia es 10) 
O =O. = 10 
a a — oe | 


Of course © is nothing other than the familiar XOR operation, although in this 
context we also get negative mod-2 numbers (—1 = 1) and subtraction mod-2 (061 = 
0 @ (—1) = 1), should we need them. 


The Vector Space B 


We define B = B? to be the vector space whose scalars come from B and whose vectors 
(objects) are ordered pairs of numbers from B. This vector space is so small I can list 


215 


its objects on one line, 


I’m not going to bother proving that 6 obeys all the properties of a vector space, and 
you don’t have to either. But if you are interested, it’s a fun ... 


[Exercise. Show that B obeys the properties of a field (multiplicative inverses, 
distributive properties, etc.) and that B obeys the properties of a vector space.| 


The Mod-2 Inner Product 


Not only that, but there is an “inner product,” 


on © ee = 241:X%2 O : 
mn Yo 1° %2 Y1°Y2.- 


You'll notice something curious about this “inner product” (which is why I put it in 
quotes and used the operator “©” rather than “.”). The vector (1, 1)’ is a non-zero 
vector which is orthogonal to itself. Don’t let this bother you. In computer science, 
such non-standard inner products exist. (Mathematicians call them “pairings,” since 
they are not positive definite, i.e., it is not true thatv AO = ||v|| > 0. We'll 
just call © a weird inner product.) This one will be the key to Simon’s algorithm, 
presented later in the course. The inner product gives rise to a modulus — or length — 
for vectors through the same mechanism we used in R?, 


lz = |e] = veo, 


where I have shown two different notations for modulus. With this definition we find 


I = 10) 
IG) = IQ =» 


The strange — but necessary — equality on the lower right is a consequence of this 
oddball inner product on B = B?. 


1 and 


[Exercise. Verify the above moduii.] 


[Notation. Sometimes Ill use an ordinary dot for the inner product, (21, yi)! - 
(x2, yz)’ instead of the circle dot, (a1, y1)' © (x2, yz)’. When you see a vector on 
each side of “-” you’ll know that we really mean the mod-2 inner product, not mod-2 
multiplication. | 


Dimension of B 


If B is a vector space, what is its dimension, and what is its natural basis? That’s 
not hard to guess. The usual suspects will work. It’s a short exercise. 


216 


[Exercise. Prove that B is 2-dimensional by showing that 


to) Gj 


form an orthonormal basis. Hint: There are only four vectors, so express each in 
this basis. As for linear independence and orthonormality, I leave that to you.| 


9.3.2 Formal Definition of a (Classical) Bit 


Definition of Bit. A “bit” is (any copy of) the entire vector space B. 


Sounds strange, I know, making a bit equal to an entire vector space. We think of a 
bit as capable of holding a 1 or a 0. This is expressed as follows. 


Definition of Bit Value. A “bit’s value” or “state” is any normalized 
(i.e., unit) vector in B. 


A bit, itself, is not committed to any particular value until we say which unit-vector 


in B we are assigning it. Since there are only two unit vectors in B, that narrows the 
field down to one of two values, which I'll label as follows: 


0] = a Saq 


[1] 


III 
a 
He © 
SS 


The other two vectors (0, 0)' and (1, 1)‘, have length 0 so cannot be normalized (i.e., 
we cannot divide them by their length to form a unit vector). 


How Programmers Can Think About Formal Bits 


If the definition feels too abstract, try this out for size. As a programmer, you’re 
familiar with the idea of a variable (LVALUE) which corresponds to a memory lo- 
cation. That variable is capable of holding one specific value (the RVALUE) at any 
point in time, although there are many possible values we can assign it. The vector 
space G is like that memory location: it is capable of holding one of several different 
values but is not committed to any until we make the assignment. Putting a value 
into a memory location with an assignment statement like “x = 51;” corresponds to 
choosing one of the unit vectors in B to be assigned to the bit. So the values allowed 
to be stored in the formal bit (copy B) are any of the unit vectors in B. 


21F 


Multiple Bits 


If we have several bits we have several copies of B, and we can name each with a 
variable like 2, y or z. We can assign values to these variables with the familiar 
syntax 


etc. 


A classical bit (uncommitted to a value) can also be viewed as a variable linear 
combination of the two basis vectors in B. 


Alternate Definition of Bit. A “bit” is a variable superposition of the 
two natural basis vectors of B, 


x = a0] + Bll], where 


Since a and 6 are scalars of B, they can only be 0 and 1, so the normalization 
condition implies exactly one of them is 1 and the other is 0. 


Two questions are undoubtedly irritating you. 


1. Is there any benefit, outside of learning quantum computation, of having such 
an abstract and convoluted definition of a classical bit? 


2. How will this definition help us grasp the qubit of quantum computing? 


Keep reading. 


9.3.3. Formal Definition of a (Classical) Logical Operator 


We will only consider unary (i.e., one bit) operators in this section. Binary operators 
come later. 


Unary Operators 


Definition of Logical Unary Operator. A “logical unary opera- 
tor” or (“one bit” operator) is a linear transformation of B that maps 
normalized vectors to other normalized vectors. 


This is pretty abstract, but we can see how it works by looking at the only four logical 
unary operators in sight. 


218 


e The constant-[0] operator A(x) = [0]. This maps any bit into the 0-bit. 
(Don’t forget, in B, the 0-bit is not the 0-vector, it is the unit vector (1, 0)‘.) 
Using older, informal truth tables, we would describe this operator as follows: 


5 [0]-op 
0 0 
1 0 


In our new formal language, the constant-[0] operator corresponds to the linear 
transformation whose matrix is 

je 

0.07’ 


since, for any unit vector (bit value) (a, 3)', we have 


1 1\ (a a B 1) 0 
0 0/ \8 0 0 _ ; 
the second-from-last equality is due to the fact that, by the normalization re- 


quirement on bits, exactly one of a~ and § must be 1. 


e The constant-[1] operator A(a) = [1]. This maps any bit into the L-bit. 
Informally, it is described by: 


z [1]-op 
0 1 
1 1 


Formally, it corresponds to the linear transformation 
0 0 
Le Dope? 
since, for any unit vector (bit value) (a, 3)', we have 
0 0\fa\ _ 0 2h: ce Gi 
11)\B) ~ \e@esp) ~ Va) > EF 


e The negation (or NOT) operator A(z) =a. This maps any bit into its 
logical opposite. It corresponds to 


(ro) 


219 


since, for any x = (a, 3)’, we get 
aN: a, HONG an TOR noe 
i, = L O78 = a v 
[Exercise. Using the formula, verify that —[0] = [1] and —[1] = [0].] 


e The identity operator A(x) = lz = x. This maps any bit into itself and 
corresponds to 
1 0 
0 1) 


[Exercise. Perform the matrix multiplication to confirm that 1[0] = [0] and 
t[1] = [1]. 
Linear Transformations that are Not Unary Operators 


Apparently any linear transformation on B other than the four listed above will not 
correspond to a logical unary operator. For example, the zero-operator 


(0 9) 


isn’t listed. This makes sense since it does not map normalized vectors to normalized 
vectors. For example, 


(0) = Gal) - G) -% 


not a unit vector. Another example is the matrix 


(ro) 


since it maps the bit [1] to the non-normalizable 0, 


Go) = GG) = & 


[Exercise. Demonstrate that 
1 1 
1 1 


does not represent a logical operator by finding a unit vector that it maps to a non- 
unit vector, thus violating the definition of a logical unary operator.| 


| 
= 


Defining logical operators as matrices provides a unified way to represent the 
seemingly random and unrelated four logical operators that we usually define using 
isolated truth tables. There’s a second advantage to this characterization. 


220 


Reversible Logic Gates 


A reversible operator or logic gate is one that can be undone by applying another op- 
erator (which might be the same as the original). In other words, its associated matrix 
has an inverse. For example, the constant-{1] operator is not reversible, because it 
forces its input to the output state [1], 


0 0\fa\ _ (0 
tap key. — Aas 
and there’s no way to reconstruct (a, 6)! reliably from the constant bit [1]; it has 


erased information that can never be recovered. Thus, the operator, and its associated 
logic gate, is called irreversible. 


Of the four logical operators on one classical bit, only two of them are reversible: 
the identity, 1, and negation, —. In fact, to reverse them, they can each be reapplied 
to the output to get the back original input bit value with 100% reliability. 


Characterization of Reversible Operators. An operator is “re- 
versible” = its matrix is unitary. 


Example. We show that the matrix for = is unitary by looking at its columns. 
The first column is 
0 
1 9 


e has unit length, since (0, 1)’- (0, 1)’ = 1, and 


which 


e is orthogonal to the to the second column vector, since (0, 1)'- (1, 0)’ = 0. 


Reversibility is a very powerful property in quantum computing, so our ability to 
characterize it with a simple criterion such as unitarity is of great value. This is easy 
to check in the classical case, since there are only four unary operators: two have 
unitary matrices and two do not. 


[Exercise. Show that the matrix for 1 is unitary and that the two constant 
operators’ matrices are not.| 


An Anomaly with Unitary Matrices in 6 
We learned that unitary transformations have the following equivalent properties. 


1. They preserve the lengths of vectors: ||Av|| = ||v||, 


2. they preserve inner products, (Av| Aw) = (v|w), and 


yal 


3. their rows (or columns) are orthonormal. 


While this is true of usual (i,e., positive definite) inner products, the pairing in B 
causes these conditions to lose sync. All four possible logical operators on bits do 
preserve lengths and so meet condition 1; we’ve already shown that they map unit 
vectors to unit vectors, and we can also check that they map vectors of length zero 
to other vectors of length zero: 


[Exercise. Verify that the constant-[1] operator maps the zero-length vectors 
(1, 1)’ and (0, 0)! to the zero-length (0, 0)’. Do the same for the constant-[0] operator. 
Thus both of these non-unitary operators preserve the (oddball) mod-2 length.| 


However, the constant-|1] and [0] operators do not preserve inner products, nor do 
they have unitary matrices. 

[Exercise. Find two vectors in B that have inner product 0, yet whose two images 
under constant-[1] operator have inner product 1.] 


This unusual situation has the consequence that, of the four linear transformations 
which qualify to be logical operators (preserving lengths) in B, only two are reversible 
(have unitary matrices). 


Although the preservation-of-lengths condition (1), is enough to provide B 
with four logical operators, that condition is not adequate to assure that 
they are all unitary (reversible). Only two of the operators satisfy the 


stronger requirements 2 or 3. 


This distinction between length-preserving operators and unitary operators is not 
replicated in qubits, and the consequences will be profound. 


9.4 The Qubit 


We are finally ready to go quantum. At each turn there will be a classical concept 
and vocabulary available for comparison because we invested a few minutes studying 
the formal definitions in that familiar context. 


9.4.1 Quantum Bits 


Definition of Qubit. A “quantum bit,” a.k.a. “qubit,” is (any copy 
of) the entire vector space H. 


Notice that we don’t need to restrict ourselves, yet, to the projective sphere. That 
will come when we describe the allowed values a bit can take. 


222 


Comparison with Classical Logic 


The classical bit was a 2-D vector space, B, over the finite scalar field B, and now we 
see that a quantum bit is a 2-D vector space, H, over the infinite scalar field C. 


Besides the underlying scalar fields being different — the tiny 2-element B vs. the 
infinite and rich C — the vectors themselves are worlds apart. There are only four 
vectors in 6, while H has infinitely many. 


However, there is at least one similarity: they are both two dimensional. The 
natural basis for B is [0] and[1], while the natural basis for H is |+) and |—). 


Quantum Computing Vocabulary 


The quantum computing world has adopted different terms for many of the established 
quantum physics entities. This will be the first of several sections introducing that 
new vocabulary. 


Symbols for Basis Vectors. To reinforce the connection between the qubit 
and the bit, we abandon the vector symbols |+) and |—) and in their places use |0) 
and |1) — same vectors, different names. This is true whether we are referring to the 
preferred z-basis, or any other orthonormal basis. 


I+) <> |0) pay 
I+), > |0), le > |) 
gee We Py ys 


Alternate x-Basis Notation. Many authors use the shorter notation for the 
x-basis, 


Oe a (ley 


x 


ee IE 


but I will eschew that for the time being; |+) and |—) already have a z-basis meaning 
in ordinary quantum mechanics, and using them for the z-basis too soon will cause 
confusion. However, be prepared for me to call |+) and |—) into action as x-basis 
CBS, particularly when we need the variable x for another purpose. 


Computational Basis States. Instead of using the term ezgenbasis, computer 
scientists refer to computational basis states (or CBS when I’m in a hurry). For 
example, we don’t talk about the eigenbasis of S., { |+), |—) }. Rather, we speak of 
the preferred computational basis, { |0) , |1) }. You are welcome to imagine it as being 
associated with the observable S,, but we really don’t care what physical observable 
led to this basis. We only care about the Hilbert space H, not the physical system, 
, from which it arose. We don’t even know what kind of physics will be used to 
build quantum computers (yet). Whatever physical hardware is used, it will give us 
the 2-D Hilbert space H. 


223 


Alternate bases like {|0),, |1), }, {10),, |1), } or even {|0),, [1)q } for some di- 
rection n, when needed, are also called computational bases, but we usually qualify 
them using the term alternate computational basis. We still have the short-hand 
terms z-basis, x-basis, etc., which avoid the naming conflict, altogether. These alter- 
nate computational bases are still defined by their expansions in the preferred, z-basis 
(ee oe e.g.), and all the old relationships remain. 


9.4.2 Quantum Bit Values 


Definition of Qubit Value. The “value” or “state” of a qubit 1s any 
unit (or normalized) vector in H. 


In other words, a qubit is an entity — the Hilbert space H — whose value can be any 
vector on the projective sphere of that space. 


A qubit, itself, is not committed to any particular value until we say which specific 
unit-vector in H we are assigning to it. 


Normalization 


Why do we restrict qubit values to the projective sphere? This is a quantum system, 
so states are always normalized vectors. That’s how we defined the state space in our 
quantum mechanics lesson. A qubit’s state (or the state representing any quantum 
system) has to reflect probabilities which sum to 1. That means the magnitude- 
squared of all the amplitudes must sum to 1. Well, that’s the projective sphere. Any 
vectors off that sphere cannot claim to reflect reality. 


Certainly there will be times when we choose to work with un-normalized vec- 
tors, especially in CS 83B, but that will be a computational convenience that must 
eventually be corrected by normalizing the answers. 


Comparison with Classical Logic 


Just as a classical bit was capable of storing a state [0] or [1] (the two unit vectors 
in B), qubits can be placed in specific states which are normalized vectors in H. 
This time, however, there are infinitely many unit vectors in H: we have the entire 
projective sphere from which to choose. 


9.4.3 Usual Definition of Qubit and its Value 


A more realistic working definition of qubit parallels that of the alternative formal 
definition of bit. 


224 


Alternative Definition of Qubit. A “qubit” is a variable superposition 
of the two natural basis vectors of H, 


lv) = a@|0) + 61), where the complex scalars satisfy 


ja? + [BPP = 1. 


I used the word “variable” to call attention to the fact that the qubit stores a value, 
but is not the value, itself. 


Global Phase Factors 
We never forget that even if we have a normalized state, 
ly) = a (0) + 6 fl), 
it is not a unique representation; any length-1 scalar multiple 
ely) = ea lO) + e861),  realé, 

still resides on the projective sphere, and since it is a scalar multiple of |w), it is a 
valid representative of same state or qubit value. We would say “|w) and e’ |w) differ 
by an overall, or global, phase factor 0,” a condition that does not change anything, 
but can be used to put |q) into a more workable form. 
Comparison with Classical Logic 
The alternate/working expressions for bits in the two regimes look similar. 

Classical: a2 = a0] + 6[l], ef of = 1 

Quantum: |W) = a |0) + 8]1), lal? + (2)? = 1 


In the classical case, a and ( could only be 0 or 1, and they could not be the same 
(normalizability), leading to only two possible states for x, namely, [0] or [1]. In the 
quantum case, a and ( can be any of an infinite combination of complex scalars, 
leading to an infinite number of distinct values for the state |~). 


9.4.4 Key Difference Between Bits and Qubits 


What’s the big idea behind qubits? They are considerably more difficult to define and 
study than classical bits, so we deserve to know what we are getting for our money. 


225 


Parallel Processing 


The main motivation comes from our Trait #6, the fourth postulate of quantum 
mechanics. It tells us that until we measure the state |w), it has a probability of 
landing in either state, |0) or |1), the exact details given by the magnitudes of the 
complex amplitudes a and £. 


As long as we don’t measure |7/), it is like the Centaur of Greek mythology: part |0) 
and part |1); when we process this beast with quantum logic gates, we will be sending 
both alternative binary values through the hardware in a single pass. That will change 
it to another normalized state vector (say |¢)), which has different amplitudes, and 
that can be sent through further logic gates, again retaining the potential to be part 
0) and part |1), but with different probabilities. 


Classical Computing as a Subset 


Classical logic can be emulated using quantum gates. 


This is slightly less obvious than it looks, and we’ll have to study how one con- 
structs even a simple AND gate using quantum logic. Nevertheless, we can correctly 
forecast that the computational basis states, |0) and |1), will correspond to the clas- 
sical bit values [0] and [1]. These two lonely souls, however, are now swimming in a 
continuum of non-classical states. 


Measurement Turns Qubits into Bits 


Well, this is not such a great selling point. If qubits are so much better than bits, 
why not leave them alone? Worse still, our Trait #7, the fifth postulate of quantum 
mechanics, means that even if we don’t want to collapse the qubit into a computational 
basis state, once we measure it, we will have done just that. We will lose the exquisite 
subtleties of a and 6, turning the entire state into a |0) or |1). 


Once you go down this line of thought, you begin to question the entire enterprise. 
We can’t get any answers if we don’t test the output states, and if they always collapse, 
what good did the amplitudes do? This skepticism is reasonable. For now, I can only 
give you some ideas, and ask you to wait to see the examples. 


1. We can do a lot of processing “below the surface of the quantum ocean,” ma- 
nipulating the quantum states without attempting an information-destroying 
measurement that “brings them up for air” until the time is right. 


2. We can use Trait #6, the fourth postulate, in reverse: Rather than looking at 
amplitudes as a prediction of experimental outcomes, we can view the relative 
distribution of several measurement outcomes to guess at the amplitudes of the 
output state. 


3. By preparing our states and quantum logic carefully, we can “load the dice” so 


226 


that the likelihood of getting an information-rich collapsed result will be greatly 
enhanced. 


9.5 Quantum Operators (Unary) 


9.5.1 Definition and Notation 


Definition of a Unary Quantum Logical Operator. A “unary 
quantum logical operator” or “unary quantum gate” is a linear 
transformation of H that maps normalized (unit) vectors to other nor- 
malized vectors. 


The first important observation is that since H is 2-dimensional, a unary quantum 
operator can be represented by a 2 x 2 matrix 


Goo G01 
aio Au) © 
Notation. I am now numbering starting from 0 rather than 1. This will continue 
for the remainder of the course. 
The big difference between these matrices and those of classical unary operators is 


that here the coefficients come from the infinite pool of complex numbers. Of course, 
not every 2 x 2 matrix qualifies as a quantum gate, a fact we’ll get to shortly. 


9.5.2 Case Study — Our First Quantum Gate, QNOT 


We'll examine every angle of this first example. It is precisely because of its simplic- 
ity that we can easily see the important differences between classical and quantum 
computational logic. 


The Quantum NOT (or QNOT) Operator, X 


The QNOT operator swaps the amplitudes of any state vector. It corresponds to 


0 1 
Lys 
the same matrix that represents the NOT operator, 7, of classical computing. The 


difference here is not in the operator but in the vast quantity of qubits to which we 
can apply it. Using |w) = (a, 3)’, as we will for this entire lecture, we find 


Xb) = ( ) (;) . i) 


In the special case of a CBS ket, we find that this does indeed change the state from 
|0) to |1) and vice versa. 


[Exercise. Using the formula, verify that X |0) = |1) and X |1) = |0).] 


221 


Notation and Vocabulary 


The X operator is sometimes called the bit flip operator, because it “flips” the CBS 
coefficients, a ++ 3. In the special case of a pure CBS input, like |0), it “flips” it to 
the other CBS, |1). 

The reason QNOT is usually labeled using the letter X is that, other than the 
factor of i the matrix is the same as the spin-1/2 observable S,. In fact, you'll recall 
from the quantum mechanics lesson that QNOT is precisely the Pauli spin matrix in 


the x-direction, 
0 1 
X = Oy — é a 


It’s best not to read anything too deep into this. The matrix that models an ob- 
servable is used differently than one that performs a reversible operation on a qubit. 
Here, we are swapping amplitudes and therefore negating computational basis states. 
That’s the important take-away. 


Gate Symbol and Circuits 


In a circuit diagram, we would use 


x 


when we want to express an X gate or, less frequently, if we want to be overly explicit, 


-—lpnor | =—-« 


We might show the effect of the gate right on the circuit diagram, 


a|0) + 8I1) x B|0) + a1). 


The input state is placed on the left of the gate symbol and the output state on the 
right. 


Computational Basis State Emphasis 


Because any linear operator is completely determined by its action on a basis, you will 
often see an operator like X defined only on the CBS states, and you are expected to 
know that this should be extended to the entire H using linearity. In this case, the 
letters x and y are usually used to label a CBS, so 
|0) 
ie or 


[1) 


228 


to be distinguished from |w), which can take on infinitely many superpositions of 
these two basis kets. With this convention, any quantum logic gate can be defined 
by its action on the |x) (and sometimes |y), |z) or |w), if we need more input kets). 
For the X gate, it might look like this 


|x) x [>2) . 


This expresses the two possible input states and says that X |0) = |70) = |1), while, 
X |1) =|-1) = |0). Using alternative notation, 


|x) xX ae: 


In fact, you’ll often see the mod-2 operator for some CBS logic. If we used that to 
define X, it would look like this: 


|x) xX \l@a). 


[Exercise. Why does the last expression result in a logical negation of the CBS?] 


You Must Remember This. The operators (@, 7, etc.) used inside the kets 
on the variables x and/or y apply only to the binary values 0 or 1 that label the basis 
states. They make no sense for general states. We must extend linearly to the rest 
of our Hilbert space. 


Sample Problem 


Given the definition of the bit flip operator, X, in terms of the CBS |z), 


|x) x |=), 


what is the action of X on an arbitrary state |q), and what is the matrix for X? 


Expand |w) along the computational basis and apply X: 


X|~) = X(al0) + Bl) 
= aX|0) + BX|1) 
Se 20}; he BT) 
= sar|T) a 8/0) 
= 6|0) + afl), 


in agreement with our original definition when viewed in coordinate form. 


The matrix for X would be constructed, as with any linear transformation, by 
applying X to each basis ket and setting the columns of the matrix to those results. 


Mx = (x0, xn) - (1. 0] 2 ae 


229 


While the problem didn’t ask for it, let’s round out the study by viewing X in terms 
of a ket’s CBS coordinates,(a, 3)'. For that, we can read it directly off the final 
derivation of X |W), above, or apply the matrix, which I will do now. 


O ITV fay = 9~fB 
1 0/ \B - a} 
One Last Time: In the literature, most logic gates are defined in terms of |x), 


ly), etc. This is only the action on the CBS, and it is up to us to fill in the blanks, 
get its action on the general |v) and produce the matrix for the gate. 


Comparison with Classical Logic 


We've seen that there is at least one classical operator, NOT, that has an analog, 
QNOT, in the quantum world. Their matrices look the same and they affect the 
classical bits, [0] / [1], and corresponding CBS counterparts, |0) / |1), identically, but 
that’s where the parallels end. 


What about the other three classical unary operators? You can probably guess 
that the quantum identity, 
_ 1 0 
1 = G ) | 


exhibits the same similarities and differences as the QNOT did with the NOT. The 
matrices are identical and have the same effect on classical/CBS kets, but beyond 
that, the operators work in different worlds. 


(Exercise. Express the gate (i.e., circuit) definition of 1 in terms of a CBS |x) 
and give its action on a general |W). Discuss other differences between the classical 
and quantum identity. | 


That leaves the two constant operators, the [0|-op and the [l]-op. The simple 
answer is there are no quantum counterparts for these. The reason gets to the heart 
of quantum computation. 


Unitarity is Not Optional 


The thing that distinguished = and 1 from [0|-op and the [1]-op in the classical case 
was unitarity. The first two were unitary and the last two were not. How did those 
last two even sneak into the classical operator club? They had the property that they 
preserved the lengths of vectors. That’s all an operator requires. But these constant 
ops were not reversible which implied that their matrices were not unitary. The 
quirkiness of certain operators being length-preserving yet non-unitary was a fluke 
of nature caused by the strange mod-2 inner product on B. It allowed a distinction 
between length-preservation and unitary. 


In quantum computing, no such distinction exists. The self-same requirement 
that operators map unit vectors to other unit vectors in H forces operators to be 


230 


unitary. This is because H has a well-behaved (positive definite) inner product. We 
saw that with any positive definite inner product, unitarity and length-preservation 
are equivalent. 

But if you don’t want to be that abstract, you need only “try on” either constant 
op for size and see if it fits. Let’s apply an attempted analog of the [0]-op to the unit 
vector |0),,. For fun, we’ll do it twice. First try out the (non-unitary) matrix for the 
classical [0|-op in the quantum regime. We find 

1 1 0 
0 0/ \l 


(C49) =2(6)6 
al() + @) 


not a unit vector. To see a different approach we show it using a putative operator, 
A, defined by its CBS action that ignores the CBS input, always answering with a 


|0), 


|x) A |O) . 


Apply A to the same unit vector |0).. 


(242) _ ao _ (12.418) 


V2 |0) , 


again, not normalized. 


Measurement 


Finally, we ask what impact QNOT has on the measurement probabilities of a qubit. 
Here is a picture of the circuit with two potential measurement “access points,” A 
(before the gate) and B (after): 


Iv) 


] | 
| | 
Y Y 
A B 
Of course, we know from Trait #7 (fifth postulate of QM) that once we measure the 
state it collapses into a CBS, so we cannot measure both points on the same “sample” 


of our system. If we measure at A, the system will collapse into either |0) or |1) and 
we will no longer have |W) going into X. So the way to interpret this diagram is to 


231 


visualize many different copies of the system in the same state. We measure some at 
access point A and others at access point B. The math tells us that 


fi\: = a|0) + 61) at Point A 
7 B\0) + all) at Point B 


So, if we measure 1000 identical states at point A, 


|) —A ’ 


we will get 


# of “0”-measurements ~ 1000 x Jal? and 


# of “1”-measurements ~% 1000 x |6|?, 


by Trait #6, the fourth QM postulate. However, if we do not measure those states, 
but instead send them through X and only then measure them (at B), 


Iw) x Al, 


the same trait applied to the output expression tells us that the probabilities will be 
swapped, 


# of “0”-measurements ~ 1000 |@|? and 


# of “1”-measurements ~*~ 1000 x Jal’, 


Composition of Gates 
Recall from the lesson on linear transformation that any unitary matrix, U, satisfies 
UiuU = UUt = 1, 


where U' is the conjugate transpose a.k.a, adjoint, of U. In other words its adjoint is 
also its inverse. 


Because unitarity is required of all quantum gates — in particular QNOT — we 
know that 
RNG 3S ER SE 


’ 


But the matrix for X tells us that it is self-adjoint: 


T 
0 1 0 1 
t — = = 
w= (to) = Go) = * 
Therefore, in this case, we get the even stronger identity 


X?= 1. 


X is its own inverse. If we apply X (or any self-adjoint operator) consecutively 
without an intervening measurement, we should get our original state back. For 
QNOT, this means 


I) x x IY) , 


which can be verified either algebraically, by multiplying the vectors and matrices, or 
experimentally, by taking lots of sample measurements on identically prepared states. 
Algebraically, for example, the state of the qubit at points A, B and C, 


A 


I) 


| | 
| | 
| | 
Y Y 
B C 


() (3) -@). 


This leads to the expectation that for all quantum gates — as long as we don’t measure 
anything — we can keep sending the output of one into the input of another, and while 
they will be transformed, the exquisite detail of the qubits’ amplitudes remain intact. 
They may be hidden in algebraic changes caused by the quantum gates, but they will 
be retrievable due to the unitarity of our gates. However, make a measurement, and 
we will have destroyed that information; measurements are not unitary operations. 


| 

| 

| 
Y 
A 


will be 


We've covered the only two classical gates that have quantum alter egos. Let’s go 
on to meet some one-bit quantum gates that have no classical counterpart. 


9.5.3 The Phase Flip, 7 


If the bit flip, X, is the operator that coincides with the physical observable S,, 
then the phase flip, Z, turns out to be the Pauli matrix a, associated with the S, 
observable. 


The Z operator, whose matrix is defined to be 


oo 


negates the second amplitude of a state vector, leaving the first unchanged, 


zw = (5 2) (8) = (S)- 


233 


Notation and Vocabulary 


Z is called a phase flip, because it changes (maximally) the relative phase of the two 

amplitudes of |W). Why is multiplication by -1 a maximal phase change? Because 
—] = e’™ 

so multiplying @ by this scalar is a 7 = 180° rotation of 6 in the complex plane. You 

can’t get more “maximal” than rotating something in the complex plane by 180°. 


[Exercise. Graph § and e'". If you are stuck, express 6 in polar form and use 
the Euler formula.] 


Furthermore, if we modify only one of the two amplitudes in a state’s CBS ex- 
pansion, that’s a relative phase change, something we know matters (unlike absolute 
phase changes, which are inconsequential multiples of the entire ket by a complex 
unit). 

[Exercise. Write out a summary of why a relative phase change matters. Hint: 
Review the sections about time evolution of a quantum state from the third (optional) 
time-dependent quantum mechanics lesson, particularly Traits #413 and #14.| 


The use of the symbol, 7, to represent this operator is, as already observed, due 
to its matrix being identical to a, of the S, observable. 


Gate Symbol and Circuits 


In a circuit diagram we use 


Z 


to express a Z gate. Its full effect on a general state in diagram form is 


a|0) + 8I1) Z a|0) — 6/1). 


Computational Basis State Emphasis 


It is uncommon to try to express Z in compact computational basis form; it leaves 
0) unchanged, and negates |1), an operation that has no classical counterpart (in B, 
—1 = 1, so —[{1] = [1], and the operation would look like an identity). However, if 
pressed to do so, we could use 


|x”) Z (—1)* |2) . 


[Exercise. Show that this agrees with Z’s action on a general |w) by expanding |q) 
along the computational basis and using this formula while applying linearity. 


234 


Measurement 


It is interesting to note that Z has no measurement consequences on a single qubit 
since the magnitude of both amplitudes remain unchanged by the gate. The circuit 


I) A 


| | 
| | 

v v 
A B 


produces the A-to-B transition 


yielding the same probabilities, |a|? and ||?, at both access points. Therefore, both 
|W) and Z |w) will have identical measurement likelihoods; if we have 1000 electrons 
or photons in spin state |) and 1000 in Z|wW), a measurement of all of them will 
throw about |a|? x 1000 into state |0) and |a|? x 1000 into state |1). 


However, two states that have the same measurement probabilities are not 
necessarily the same state. 


The relative phase difference between |q) and Z|w) can be “felt” the moment we 
try to combine (incorporate into a larger expression) either state using superposition. 
Mathematically, we can see this by noticing that 


Iv) + ly) 


a — ke) ? 


i.e., we cannot create a new state from a single state, which is inherently linearly 
dependent with itself, while a distinct normalized state can be formed by 


Iw) + ZI) _ (a +a) |0) + (8-8) 11) 
2a 20 
2a |0) 
a ue 


Unless it so happens that a = 1 and @ = 0, we get different results when using |w) 
and Z |v) in the second position of these two legal superpositions, demonstrating that 
they do, indeed, have different physical consequences. 


9.5.4 The Bit-and-Phase Flip Operator, Y 


After seeing the X and Z operators, we are compelled to wonder about the operator 
Y that corresponds to the Pauli matrix o, associated with the S, observable. It is 
just as important as those, but has no formal name (we'll give it one in a moment). 


235 


The Y operator is defined by the matrix 


which has the following effect on a general ket: 


ny a) ie) Sg) ea) 


that last equality accomplished by multiplying the result by the innocuous unit scalar, 
i 


Notation and Vocabulary 


Although Y has no official name, we will call it the bit-and-phase flip, because it flips 
both the bits and the relative phase, simultaneously. 


The symbol Y is used to represent this operator because it is identical to the Pauli 
matrix Oy. 


Gate Symbol and Circuits 


In a circuit diagram, we use 


Mg 


to express a Y gate. Its full effect on a general state in diagram form is 


a|0) + 6|1) Y —i6|0) + ta|l) . 


Computational Basis State Emphasis 


No one expresses Y in computational basis form, but it can be done. 

[Exercise. Find a formula that expresses Y’s action on a CBS.| 

What’s with the i? [Optional Reading]. Clearly, we can pull the 7 out 
of the matrix (or the output ket) with impunity since we can always multiply any 
normalized state by a unit scalar without changing the state. That makes us wonder 
what it’s doing there in the first place. Superficially, the i makes Y = o,, a desirable 
completion to the pattern started with X = o, and Z = ¢o,. But more deeply, by 
attaching a factor of 7 to this operator, we can, with notational ease, produce a linear 
combination of the three matrices using real coefficients, nz, ny and nz, 


0 1 QO -2 1 O - Ne Ng = yl 
nm (j 4 ae ny (j ) -) a. (4 ay = ee —ne 5 


236 


This may seem less than a compelling justification, but the expression is of the syn- 
tactic form 


oon, 
where, ni is a real 3-D unit vector (nz, ny, nz)’, @ is a “vector” of operators 
Ox 
os Veh. 
Ox 


and their formal “dot” product, on, represents the matrix for the observable Sq 
(the measurement of spin in the most general direction defined by a unit vector n). 
This will be developed and used in the next course CS 83B. 


Measurement 


As a combination of both bit flip (which swaps probabilities) and phase flip (which 
does not change probabilities), Y has the same measurement consequence as a simple 
QNOT: 


Iv) 


| | 
| | 

v Y 
A B 


()= (2). 


which causes the probabilities, |a|? and |6|? to get swapped at access point B. 


This gives an A-to-B transition 


9.5.5 The Hadamard Gate, H 


While all these quantum gates are essential, the one having the most far-reaching 
consequences and personifying the essence of quantum logic is the Hadamard gate. 


The Hadamard operator, H, is defined by the matrix 


1) 


Traditionally, we first address its effect on the CBS states, 


wo (AQ) - BRP 


mn = Ht 3)Q) = Ma 


We immediately recognize that this “rotates” the z-basis kets onto the x-basis kets, 
EEO = NOD and 
EEL = Dae 


[Exercise. Prove that H is unitary. 


A of a General State 


This is to be extended to an arbitrary state, |q), using the obvious rules. Rather than 
approach it by expanding |w) along the z-basis then extending linearly, it’s perhaps 
faster to view everything in terms of matrices and column vectors, 


a = elt A)G) = alts) 


which can be grouped 
Hy) = (SP) in + (7) 


Gate Symbol and Circuits 


In a circuit diagram, we use 


es 


to express an H gate. Its full effect on a general state in diagram form is 


ajo) + 61) —{H}— (#2) 0) + (58) In). 


Computational Basis State Emphasis 


It turns out to be very useful to express H in compact computational basis form. [’ll 
give you the answer, and let you prove it for for yourself. 


(0) ae (= 1)? 1) 
B 


[Exercise. Show that this formula gives the right result on each of the two CBS 
kets. 


In) H 


238 


Measurement 


Compared to the previous gates, the Hadamard gate has a more complex and subtle 
effect on measurement probabilities. The circuit 


Iv) 


| 

| 

] 

Y 
B 


a) _, (8 
2 
Pp 
() (3) 


causing the output probabilities to go from |a|? and |3|? at point A to 


| 

| 

y 
A 


gives an A-to-B transition 


P(HWW)\Io) = eee 
P(HIW)\ ty) = See. 


at point B. 
Notation. The diagonal arrow, \,, is to be read “when measured, collapses to.” 


To make this concrete, here are the transition probabilities of a few input states 
(details left as an exercise). 


Iw) Probabilities Before Probabilities After 
v-() P(\,(0)) =1 P(\,|0)) = 1/2 
0 PUY =0 PUSS) = 172 
no (| P(\,|0)) = 1/2 P(\,|0)) = 0 
—1/V2 PUD) Sy PGI): Sed 
wel’ 2 P(\|0)) = 3 P(\|0)) = 1/2 
=/7 P(\|D)) = 7 P(\(l)) = 1/2 
w= [1 | P(\,|0)) = 1/4 P(\,|0)) = 248 
V3/2 PCY [Dy] 3/4 PGI) S72 


[Exercise. Verify these probabilities. 


239 


9.5.6 Phase-Shift Gates, S, T and Rg 


The phase-flip gate, Z, is a special case of a more general (relative) phase shift 
operation in which the coefficient of |1) is “shifted” by 6 = 7 radians. There are 
two other common shift amounts, 7/2 (the S operator) and 7/4 (the T operator). 
Beyond that we use the most general amount, any @ (the Rg operator). Of course, 
they can all be defined in terms of Rg, so we'll define that one first. 


The phase shift operator, Re, is defined by the matrix 


1 0 
0 ef)? 


where @ is any real number. It leaves the coefficient of |0) unchanged and “shifts” (or 
“rotates” ) |1)’s coefficient by a relative angle 6, whose meaning we will discuss in a 
moment. Here’s the effect on a general state: 


ole) = (5 aw) (G) = (city) - 


The operators S' and T are defined in terms of Ro, 


1 0 
a Ry/2 i | ’ 


and 


Vocabulary 


S is called the “phase gate,” and in an apparent naming error T is referred to as the 


“/8 gate,” but this is not actually an error as much as it is a change of notation. 
Like a state vector, any unitary operator on state space can be multiplied by a unit 
scalar with impunity. We’ve seen the reason, but to remind you, state vectors are 
rays in state space, all vectors on the ray considered to be the same state. Since 
we are also working on the projective sphere, any unit scalar not only represents the 
same state, but keeps the vector on the projective sphere. 


With that in mind, if we want to see a more balanced version of 7’, we multiply 


it by e~ ‘7/8 and get the equivalent operator, 
—in/8 0 
ay he 
Ls ( 0 os) i 
Measurement 


These phase shift gates leave the probabilities alone for single qubit systems, just as 
the phase flip gate, Z, did. You can parallel the exposition presented for Z in the 
current situation to verify this. 


240 


9.6 Putting Unary Gates to Use 


9.6.1 Basis Conversion 


Every quantum gate is unitary, a fact that has two important consequences, the first 
of which we have already noted. 


1. Quantum gates are reversible; their adjoints can be used to undo their action. 


2. Quantum gates map any orthonormal CBS to another orthonormal CBS. 


The second item is true using logic from our linear algebra lesson as follows. Unitary 
operators preserve inner products. For example, since the natural CBS {|0), |1) } — 
which we can also express using the more general notation { |) }}_, — satisfies the 
orthonormality relation 


ANG? 2, Oey (kronecker delta) , 
then the unitarity of U means we also have 
(c|U'U|y) = Ones 


(V’ll remind you that the LHS of last equation is nothing but the inner product of 
U |y) with U |x) expressed in terms of the adjoint conversion rules introduced in our 
lesson on quantum mechanics). 


That tells us that {U|0),U|1)} are orthonormal, and since the dimension of 
H = 2 they also span the space. In other words, they form an orthonormal basis, 
as claimed. 


Let’s call this the basis conversion property of unitary transformations and make 
it a theorem. 


Theorem (Basis Conversion Property). Jf U is a unitary operator 
and A is an orthonormal basis, then U (A), i.e., the image of vectors A 
under U, is another orthonormal basis, B. 


[Exercise. Prove the theorem for any dimension, N. Hint: Let b, = U (ax) 
be the kth vector produced by subjecting a, € A to U. Review the inner product- 
preserving property of a unitary operator and apply that to any two vectors by 
and b; in the image of A. What does that say about the full set of vectors B = 
U (A)? Finally, what do you know about the number of vectors in the basis for an 
N-dimensional vector space’| 


241 


QNOT and Bases 


When applied to some gates, like QNOT, this is a somewhat trivial observation since 
QNOT maps the z-basis to itself: 


a 


Hadamard and Bases 


In other situations, this is a very useful and interesting conversion. For example, if 
you look back at its effect on the CBS, the Hadamard gate takes the z-basis to the 
x-basis: 


H : |0) ++/0), 
TP Sy ese eg 


and, since every quantum gate’s adjoint is its inverse, and H is self-adjoint (easy 
[Exercise]), it works in the reverse direction as well, 


t+ |0) 
He. (1). 


[Exercise. Identify the unitary operator that has the effect of converting between 
the z-basis and the y-basis.| 


9.6.2 Combining Gates 


We can place two or more one-qubit gates in series to create desired results as the 
next two sections will demonstrate. 


An z-Basis QNOT 


The QNOT gate (a.k.a. X), swaps the two CBS states, but only relative to the z-basis, 
because that’s how we defined it. An easy experiment shows that this is not true of 
another basis, such as the x-basis: 


0) — 1) Xo) — XI) 
x ( Wp = Wp 
1) — [0) 


age “os =D. = Ne 


[Exercise. To what is the final equality due? What is X |0),.?] 


X |1), 


242 


[Exercise. Why is this not a surprise? Hint: Revive your quantum mechanics 
knowledge. X is proportional to the matrix for the observable, S,, whose ezgenvectors 
are |0),, and |1).,, by definition. What is an eigenvector?] 


If we wanted to construct a gate, QNOT,, that does have the desired swapping 
effect on the x-basis we could approach it in a number of ways, two of which are 


1. brute force using linear algebra, and 


2. a gate-combination using the basis-transforming power of the H-gate. 


Pll outline the first approach, leaving the details to you, then show you the second 
approach in its full glory. 


Brute Force. We assert the desired behavior by declaring it to be true (and 
confirming that our guess results in a unitary transformation). Here, that means 
stating that QNOT;,|0), = |1), and QNOT;,|1), = |0),. Express this as two 
equations involving the matrix for QNOT), and the z-basis kets in coordinate form 
(everything in z-basis coordinates, of course). You'll get four simultaneous equations 
for the four unknown matrix elements of QNOT,. This creates a definition in terms 
of the natural CBS. We confirm it’s unitary and we’ve got our gate. 


[Exercise. Fill in the details for QNOT,,.] 


Gate Combination. We know that QNOT (i.e., X) swaps the z-CBS, and H 
converts between z-CBS and x-CBS. So we use H to map the x-basis to the z-basis, 
apply X to the z-basis and convert the results back to the x-basis: 


Hx Hie 


Let’s confirm that our plan worked. 


Note. Gates operate from left-to-right, but operator algebra moves from right- 
to-left. When translating a circuit into a product of matrices, we must reverse the 
order. In this case, we can’t “see” that effect, since the gate is symmetric, but it 
won’t always be, so stay alert. 


Substituting the matrices for their gates (and reversing order), we get 
her Gs | 0 1 Tiel al 
QNOT, = (7) (X) (i) = 9 (; 5) 5 i) V2 (; a 


Ce a ec ee 


and we have our matrix. It’s easy to confirm that the matrix swaps the x-CBS kets 
and is identical to the matrix we would get using brute force (I’ll let you check that). 


243 


Circuit Identities 


The above example has a nice side-effect. By comparing the result with one of our 
basic gates, we find that H + X — H is equivalent to Z, 


HX RA = Z 


There are more circuit identities that can be generated, some by looking at the 
matrices, and others by thinking about the effects of the constituent gates and con- 
firming your guess through matrix multiplication. 


Here is one you can verify, 


ieee aged = Xe ’ 


and here’s another, 


ae emles = XX = eo eke = 1 


The last pattern is true for any quantum logic gate, U, which is self-adjoint because 
then U? = U'U = 1, the first equality by “self-adjoint-ness” and the second by 
unitarity. 

Some operator equivalences are not shown in gate form, but rather using the 
algebraic operators. For example 


or 
NV Ae SS OVX SS al. 


That’s because the algebra shows a global phase factor which may appear awkward 
in gate form yet is still important if the combination is to be used in a larger circuit. 
As you may recall, even though a phase factor may not have observable consequences 
on the state alone, if that state is combined with other states prior to measurement, 
the global phase factor can turn into a relative phase difference, which does have 
observable consequences. 


I will finish by reminding you that the algebra and the circuit are read in opposite 
order. Thus 


eer, alls 


corresponds to the circuit diagram 


Zip eee = a1 


This completes the basics of Qubits and their unary operators. There is one final 
topic that every quantum computer scientist should know. It is not going to be used 
much in this course, but will appear in CS 83B and CS 83C. It belongs in this chapter, 
so consider it recommended, but not required. 


244 


9.7 The Bloch Sphere 


9.7.1 Introduction 


Our goal is to find a visual 3-D representation for the qubits in H. To that end, we 
will briefly allude to the lecture on quantum mechanics. 


If you studied the optional time evolution of a general spin state corresponding 
to a special physical system — an electron in constant magnetic field B — you learned 
that the expectation value of all three observables formed a real 3-D time-evolving 
vector, 


which precesses around the direction of B. 


Do Not Panic. You don’t have to remember or even re-study that section. This 
is merely a reference in case you want to connect that material with the following, 
which is otherwise self-contained. 


The time evolution started out in an initial spin state at time t = 0, and we 
followed its development at later times. However, today, we need only consider the 
fixed state at its znitial time. No evolution is involved. 


With that scary introduction, let’s calmly look at any general state’s expansion 
along the preferred CBS, 


I) = €|0) + el). 


9.7.2 Rewriting |) 


We start by finding a more informative representation of |q). First, express c,; and 
C2 in polar form, giving the equivalent state 


7 cert 
2) oS s ceive * 


(it 
Then multiply by the unit scalar e(A3") to get a more balanced equivalent state, 


ells) 
vy) = noe 


8 ent 
Now, simplify by making the substitution 


_ an-® 
= a" 


245 


to get a balanced Hilbert space representative of our qubit, 


7 cel? 
1) ~~ seve 7 


so cand s can be equated with the sine and cosine of some angle which we call g i.e., 


As a qubit, |¢) is normalized, 


= cos- d 
Cc COs 5) an 
8 
S = sin-—. 
2 


We'll see why we pick a not 0, next. 


9.7.3. The Expectation Vector for |) 


The three logic gates, X, Y and Z, are represented by unitary matrices, but they also 
happen to be Hermitian. This authorizes us to consider them observables defined by 
those matrices. In fact, we already related them to the observables S,, 5S, and S., 
the spin measurements along the principal axes, notwithstanding the removal of the 
factor 5 . Therefore, each of these observables has an expectation value — a prediction 
about the average of many experiments in which we repeatedly send lots of qubits 
in the same state, |W), through one of these gates, measure the output (causing a 
collapse), compute the average for many trials, and do so for all three gates X, Y 
and Z. We define a real 3-D vector, s, that collects the expectation value into one 
column, to be 


(2) iy) 


At the end of the quantum mechanics lecture we essentially computed these expec- 
tation values. If you like, go back and plug t = 0 into the formula there. Aside from 
the factor of 2 (caused by the difference between X/Y/Z and S,,/S,/S.), you will get 


sin @ cos @ 
s = —sin#@ sing 
cos 
By defining c and s in terms of go we ended up with expectation values that had the 
whole angle, 0, in them. This is a unit vector in R* whose spherical coordinates are 


(1, 0, —¢@)', ie., it has a polar angle 6 and azimuthal angle —¢. It is a point on the 
unit sphere. 


246 


9.7.4 Definition of the Bloch Sphere 


The sphere in R? defined by 


{a ja) = i} 
is called the Bloch sphere when the coordinates of each point on the sphere n = 
(x,y, z)' are interpreted as the three expectation values (X), (Y) and (Z) for some 


qubit state, |¢). Each qubit value, |W), in H corresponds to a point n on the Bloch 
sphere. 


If we use spherical coordinates to represent points on the sphere, then n = 
(1, 6, ¢)' corresponds to the jw) = a@|0)+ 8 |1) in our Hilbert space H accord- 
ing to 


‘ 1, cos (g et? 
a ee € Bloch sphere ¢— > |W) = ? EH. 
Q Sph 


Now we see that a polar angle, 0, of a point on the Bloch sphere gives the magnitudes 
of its corresponding qubit coordinates, but not directly; when 6 is the polar angle, 
0/2 is used (through sine and cosine) for the qubit coordinate magnitudes. 


247 


Chapter 10 


Tensor Products 


V@W 


10.1 Tensor Product for Quantum Computing 


While quantum unary gates are far richer in variety and applicability than classical 
unary gates, there is only so much fun one can have with circuit elements, like 


U 


d 


which have a single input. We need to combine qubits (a process called quantum 
entanglement), and to do that we’ll need gates that have, at a minimum, two inputs, 


U 


(You may notice that there are also two outputs, an inevitable consequence of wni- 
tarity that we’ll discuss in this hour.) 


In order to feed two qubits into a binary quantum gate, we need a new tool to 
help us calculate, and that tool is the tensor product of the two single qubit state 
spaces 


TEI 


The concepts of tensors are no harder to master if we define the general tensor product 
of any two vector spaces, V and W of dimensions / and m, respectively, 


VOW, 


and this approach will serve us well later in the course. We will then apply what we 
learn by setting V = W =H. 


248 


The tensor product of more than two component spaces like 
Y@VU@VW.@-:-, 
or more relevant to us, 
HEH@H®:::, 


presents no difficulty once we have mastered the “order 2 tensors” (product of just 
two spaces). When we need a product of more than two spaces, as we shall in a future 
lecture, I'll guide you. For now, let’s learn what it means to form the tensor product 
of just two vector spaces. 


10.2 The Tensor Product of Two Vector Spaces 


10.2.1 Definitions 


Whenever we construct a new vector space like V @W, we need to handle the required 
equipment. That means 


1. specifying the scalars and vectors, 
2. defining vector addition and scalar multiplication, and 


3. confirming all the required properties. 


Items 1 and 2 are easy, and we won't be overly compulsive about item 3, so it 
should not be too painful. We’ll also want to cover the two — normally optional but 
for us required — topics, 


4. defining the inner product and 


5. establishing the preferred basis. 


If you find this sort of abstraction drudgery, think about the fact that tensors are the 
requisite pillars of many fields including structural engineering, particle physics and 
general relativity. Your attention here will not go unrewarded. 


Overview 


The new vector space is based on the two vector spaces V (dimension = /) and W 
(dimension = m) and is called tensor product of V and W, written V @W. The new 
space will turn out to have dimension = lm, the product of the two component space 
dimensions). 


249 


The Scalars of V @ W 


Both V and W must have a common scalar set in order to form their inner product, 
and that set will be the scalar set for V @ W. For real vector spaces like R? and R°, 
the scalars for 


R? @ R3 


would then be R. For the Hilbert spaces of quantum physics (and quantum comput- 
ing), the scalars are C. 


The Vectors in V @ W 


Vectors of the tensor product space are formed in two stages. I like to Compart- 
“mental” -ize them as follows: 


1. We start by populating V ® W with the formal symbols 
Snr 4 w” 


consisting of one vector v from V and another vector w from W. v ®w is called 
the tensor product of the two vectors v and w. There is no way to further merge 
these two vectors; v ® w is as concise as their tensor product gets. 


For example, in the case of (5, —6)’ € R? and (7, 0, 3)’ € R®, they provide the 
tensor product vector 


5 TT 
( Je 0) € ReR’, 
—6 3 


with no further simplification possible (notwithstanding the natural basis coor- 
dinate representation that we will get to, below). 


Caution. No one is saying that all these formal products v®w are distinct from 
one another. When we define the operations, we'll see there is much duplication. 


2. The formal vector products constructed in step 1 produce only a small subset 
of the tensor product space V @ W. The most general vector is a finite sum of 
such symbols. 


The full space V ® W consists of all finite sums of the form 
S- vz; ®@ we, with 
k 


v,€V and w, €W. 


250 


For example, in the case of the complex vector spaces V = C? and W = C7’, 
one such typical vector in the “product space” C? @ C* would be 


1g : tj ie D) 
-6 |@}3] + 6 }@] 4 + [2)@] 7% 
32 32 2 : 

A 0 —4 


Although this particular sum can’t be further simplified as a combination of 
separable tensors, we can always simplify long sums so that there are at most 
Im terms in them. That’s because (as we’ll learn) there are /m basis vectors for 
the tensor product space. 


Second Caution. Although these sums produce all the vectors in V ® W, 
they do so many times over. In other words, it will not be true that every sum 
created in this step is a distinct tensor from every other sum. 


Vocabulary 


e Product Space. The tensor product of two vector spaces is sometimes referred 
to as the product space. 


e Tensors. Vectors in the product space are sometimes called tensors, empha- 
sizing that they live in the tensor product space of two vector spaces. However, 
they are still vectors. 


e Separable Tensors. Those vectors in V @® W which arise in the step 1 are 
called separable tensors; they can be “separated” into two component vectors, 
one from each space, whose product is the (separable) tensor. Step 2 presages 
that most tensors in the product space are not separable. Separable tensors are 
sometimes called “pure” or “simple”. 


e Tensor Product. We can use the term “tensor product” to mean either the 
product space or the individual separable tensors. Thus V ® W is the tensor 
product of two spaces, while v @ w is the (separable) tensor product of two 
vectors. 


Vector Addition 


This operation is built-into the definition of a tensor; since the general tensor is the 
sum of separable tensors, adding two of them merely produces another sum, which 
is automatically a tensor. The twist, if we can call it that, is how we equate those 
sums which actually represent the same tensor. This all expressed in the following 
two bullets. 


251 


e For any two tensors, 
/ / / 
¢ = Sov @ws CS) YO 
k j 
their vector sum is the combined sum, i.e., 


C+ = Dive@we + D views, 
k ; 


which simply expresses the fact that a sum of two finite sums is itself a finite 
sum and therefore agrees with our original definition of a vector object in the 
product space. The sum may need simplification, but it is a valid object in the 
product space. 


e The tensor product distributes over sums in the component space, 
(v + w)@w = v®w + v’'@w_ and 


v@(w + w) = v@w + vew. 


Practically, we only need to understand this as a “distributive property,” but 
in theoretical terms it has the effect of producing countless sets of equivalent 
vectors. That is, it tells how different formal sums in step 2 of “The Vectors 
of V ® W” might represent the same actual tensor. 


Commutativity. Vector addition commutes by definition; the formal sums in step 
2 are declared to be order-independent. 


Scalar Multiplication 


Let c be a scalar. We define its product with a tensor in two steps. 


e c Times a Separable Tensor. 


c(v ® w) (cv) @w 


v ® (cw). 


This definition requires — declares — that it doesn’t matter which component 
of the separable tensor gets the c. Either way, the two tensors formed are the 
same. It has the effect of establishing the equivalence of initially distinct formal 
symbols v ® w in step 1 of “The Vectors of V @ W.” 


e c Times a Sum of (Separable) Tensors. 
C(Vo @ Wo + vi®wi) = Clvo@wo) + c(vi@wyi). 


Because any tensor can be written as the finite sum of separable tensors, this 
requirement covers the balance of the product space. You can distribute c over 
as large a sum as you like. 


e Order. While we can place c on either the left or right of a vector, it is usually 
placed on the left, as in ordinary vector spaces. 


252 


The Requisite Properties of V @ W 


Properties like commutative addition, distributivity of scalar multiplication are auto- 
matic consequences of the above definitions. I'll leave it as an optional exercise. 


(Exercise. Prove that V © W satisfies the various operational requirements of a 
vector space. | 


Inner Products in V @ W 


If V and W possess inner products, an inner product is conferred to the product 
space V © W by 

waw | Vvew) = (viv)-wlw), 
where “-” on the RHS is scalar multiplication. This only defines the tensor product of 
two separable tensors, however we extend this to all tensors by asserting a distributive 
property (or if you prefer the terminology, by “extending linearly”). For example, 


(vaw | RT + v'@w" ) 
= (v@w | ve@w) + (v@w | v’ ew’). 


[Exercise. Compute the inner product in R? @ R® 


T 1 
C(S)e(o} | Cele} } | 
3 3 
[Exercise. Compute the inner product in C? @ C* 


. 1 : 1 

14+i2 9 —1 2 

( —6 |] ® 3 + —6 0 
4 1 


J 


[Exercise. Prove that the definition of inner product satisfies all the usual re- 
quirements or a dot or inner product. Be sure to cover distributivity and positive 
definiteness.] 


3t 


re 


The Natural Basis for V ® W 


Of all the aspects of tensor products, the one that we will use most frequently is the 
preferred basis. 


Tensor Product Basis Theorem (Orthonormal Version). [f V has 
dimension | with orthonormal basis 


l-1 
{ Vk \ ’ 
k=0 


253 


and W has dimension m, with orthonormal basis 


then V ® W has dimension lm and “inherits” a natural (preferred) or- 
thonormal basis 


l-1, m-1 
{ VE @ Ww; \ 


j,k =0,0 


from V and W. For example, the natural basis for R? @ R? is 


Proof of Basis Theorem. I'll guide you through the proof, and you can fill in 
the gaps as an exercise if you care to. 


Spanning. A basis must span the space. We need to show that any tensor can 
be expressed as a linear combination of the alleged basis vectors v;, ® w,;. This is an 
easy two parter: 


1. Any v € V can be expanded along the V- basis as 
v= S- QkVE 
and any w € W can be expanded along the W- basis as 
w= S° Byw;, 
which implies that any separable tensor v ® w can be expressed 
V@w = S- aK 8; (vz © w;). 


[Exercise. Prove the last identity by applying linearity (distributive properties 
of the tensor product space).| 


2. Any tensor is a sum of separable tensors, so item 1 tells us that it, too, can 
be expressed as a linear combination of v;, ® w,;. [Exercise. Demonstrate this 
algebraically. | 


254 


Linear Independence and Orthonormality. We rely on a little theorem to 
which I subjected you twice, first in the linear algebra lecture, then again in the 
Hilbert space lecture. It said that an orthonormality = linearly independence. We 
now show that the set {v;, ®w,} is orthonormal collection of vectors. 


(vg @w;| Ver @wy) =  (VeE| Ver) Ow; | w5y) 


7 Le: Thea and 739! 
- 0, otherwise 


This is the definition of orthonormality, so by our little theorem the vectors in 
{v;, ® w;} are linearly independent. QED 


The basis theorem works even if the component bases are not orthonormal. Of 
course, in that case, the inherited tensor basis is not orthonormal. 


Tensor Product Basis Theorem (General Version). Jf V has di- 
mension L with basis 
I-1 
{ Vk \ ’ 
k=0 


and W has dimension m, with basis 


m1 
Ww; ‘ 
{ } j=0 
then V ® W has dimension lm and “inherits” a natural (preferred) basis 


l-1, m-1 
{ vi ® Ww; \ 
j,k =0,0 


from V and W. 


If V and W each have an inner product (true for any of our vector spaces) this 
theorem follows immediately from the orthonormal basis theorem. 


[Exercise. Give a one sentence proof of this theorem based on the orthonormal 
product basis theorem. ] 


The theorem is still true, even if the two spaces don’t have inner products, but 
we won’t bother with that version. 


Practical Summary of the Tensor Product Space 
While we have outlined a rigorous construction for a tensor product space, it is usually 


good enough for computer scientists to characterize the produce space in terms of the 
tensor basis. 


255 


The produce space, U = V @W consists of tensors, u, expressible as sums of 
the separable basis 


{ Ve ® w; p20, k=0,...,(m—1}, 


weighted by scalar weights cz,;, 


(n—1)(m—1) 


u = ys Cj (V_ @ Wy) - 


k=0 
j=0 


The nm basis tensors, v;, ® w,, are induced by the two component bases, 


y 


{ve HEI and 
W= {w; ji 


The sums, products and equivalence of tensor expressions are defined by the required 
distributive and commutative properties listed earlier, but can often be taken as the 
natural rules one would expect. 


Conventional Order of Tensor Basis 


While not universal, when we need to list the tensor basis linearly, the most common 
convention is to let the left basis index increment slowly and the right increment 
quickly. It is “V-major / W-minor format” if you will, an echo of the row-major 
(column-minor) ordering choice of arrays in computer science, 


{v0 ® wo, VoS8W1, VoSWe, ---, VoWWm-1, 


Vi®Wo, Vi®Swi, Vi Swe, ..., Vi®Wm-i, 


Vi-18Wo, Wi1@Wi, Vi-1®@W2,Vi-1® Win} 5 


You might even see these basis tensors labeled using the shorthand like ¢;,;, 


{ G00 ’ Co1 ’ Coz j, GEE 9 Co(m—1) ’ 
C10 ’ Cu ’ C2 qo TE 9 Cig) ’ 
Ceo Gans Gas 2089 Co-nim-n} 


256 


10.2.2 Tensor Coordinates from Component-Space Coordi- 
nates 


We have defined everything relating to a tensor product space V ® W. Before we 
apply that to qubits, we must make sure we can quickly write down coordinates of 
a tensor in the preferred (natural) basis. This starts, as always, with (i) separable 
tensors, from which we move to the (iz) basis tensors, and finally graduate to (ii7) 
general tensors. 


Natural Coordinates of Separable Tensors 


We are looking for the coordinates of a pure tensor, expressible as a product of two 
component vectors whose preferred coordinates we already know, 


Co do 
C1 dy 
VOw = : 
Cl-1 Gift 


[A Reminder. We are numbering staring with 0, rather than 1, now that we are in 
computing lessons. | 

I’m going to give you the answer immediately, and allow you to skip the explana- 
tion if you are in a hurry. 


Codo 
ue Cody 
dy 

Co : Codm—1 
dat cro 
cd, 
do 
Co do dy C1dm—1 
Cl dy, _ ra : _ Cod 
. é . diy 24 ae cody 
Cea ipsa : 
C2dm—1 
do 
dy 
Cl-1 . c—1do 
, q—idy 
dm—1 : 
CAG m4 


257 


Example. For (5, —6)' € R? and (7, 0, 3)! € R®, their tensor product in R? @ R® 
has natural basis coordinates given by 


57 
0 


T 
15 
( Ef Ssio\ = 
3 —67 
0 
—18 


[Exercise. Give the coordinate representation of the tensor 
14 


—6 |] ® 
3t 


yw hd 


in the natural tensor basis. | 


Explanation. I'll demonstrate the validity of the formula in the special case 
R? @ R’. The derivation will work for any V and W (as the upcoming exercise 
shows). 


Consider the tensor product of two general vectors, 


do 
C 
(4 
a dy 
Let’s zoom in on the meaning of each column vector which is, after all, shorthand 
notation for the following, 


I 0 1 0 0 
[a @ + Cy (*)] &) do Oy}; + dy 1 + dy 0 
0 0 1 


Now apply the linearity to equate the above with 


1 : 1 : 1 0 
codo ( ) & 0 + cody ( ) & 1 + Coda ( ) &) 0 
0 0 0 
0 0 1 
0 : 0 0 S 
+ cdo ® | 0 + cd, @{l + cdo @® | 0 
1 0 1 0 1 1 


Next, identify each of the basis tensor products with their symbolic v, (for R?) and 
w, (for R*) to see it more clearly, 


codo (Vo @ Wo) + Codi (vo@wi) +  cCod2 (vo ® we) 
+ c1do (vy &® Wo) Tt cd, (vy & Ww) a cdo (vy & Wo) : 


258 


The basis tensors are listed in the conventional (“V-major / W-minor format”) al- 
lowing us to write down the coordinates of the tensor, 


Codo 
cody 
Cody 
cdo 
cy dy 
cdg 


as claimed. QED 


[Exercise. Replicate the demonstration for the coordinates of a separable tensor 
v © w in any product space V © W.| 
Natural Coordinates of Basis Tensors 


By definition and a previous exercise (see the lecture on linear transformations) we 
know that each basis vector, b,, when expressed in its own B-coordinates looks exactly 
like the kth preferred basis element, i.e., 


0 
b;, = 1] <— kth element. 
0 


The is true for our tensor basis, which means that once we embrace that basis, we 
have lm basis vectors 


1 0 0 
0 ’ ’ 1 ’ ’ 0 Im, 
0 0 1 


which don’t make reference to the two component vector spaces or the inherited V - 
major / W-minor ordering we decided to use. However, for this to be useful, we need 
an implied correspondence between these vectors and the inherited basis 


{v0 ® wo, VoS8W1, VoSWe, ---, VoOWm-1, 


Vi®Wo, Vi®wi, Vi Wwe, ..., Vi@Wm-i, 


Vi-18Wo, Wi-1®Wi1, VWi-18W2,vi-18 wna} . 


259 


This is all fine, as long as we remember that those self-referential tensor bases as- 
sume some agreed-upon ordering system, and in this course that will be the V-major 
ordering. 


To illustrate this, say we are working with the basis vector 


oro c oO OG 


In the rare times when we need to relate this back to our original vector spaces we 
would count: The 1 is in position 4 (counting from 0), and relative to R? and R? this 
means 


AS BS ae AY 


yielding the individual basis vectors #1 in V and #1 in W. Therefore, the corre- 
spondence is 


GOroOoco:S 

= 

®& 

z 

| 
a> 
oS 

NS 

®& 

KO 


which we can confirm by multiplying out the RHS as we learned in the last section 


0-0 0 
0-1 0 
Oat: 0-0 0 
&® 1 — =a 
1 1-0 0 
0 
1-1 1 
1-0 0 


This will be easy enough with our 2-dimensional component spaces H, but if you 
aren’t prepared for it, you might find yourself drifting aimlessly when faced with a 
long column basis tensor and don’t know what to do with it. 


260 


Natural Coordinates of General Tensors 


Because the tensor space has dimension 1m, we know that any tensor in the natural 
basis will look like 


Co 
G1 
G2 


Qa: 


Cina 


The only thing worth noting here is the correspondence between this and the com- 
ponent spaces V and W. If we are lucky enough to have a separable tensor in our 
hands, this would have the special form 


codo 
Cody 
Cody 


cdo 
cd, 
cid | ? 


codo 
Cody 
Cody 


and we might be able to figure out the component vectors from this. However, in 
general, we don’t have separable tensors. All we can say is that this tensor is a linear 
combination of the /m basis vectors, and just accept that it has the somewhat random 
components, ¢; which we might label simply 


Co 
C1 
C2 


Ging 22 


Git 


261 


or we might label with an eye on our component spaces 


Coo 
Cou 
Go2 


Co(m—1) 
Gio 
Gu 
G12 


C1(m~—1) 
620 
G21 
(22 


C2(m—1) 


with the awareness that these components ¢;,; may not be products of two factors 
cd; originating in two vectors (co, ..., G1)’ and (do, ..., dm—1)*- 


One thing we do know: Any tensor can be written as a weighted-sum of, at most, 
Im separable tensors. (If this is not immediately obvious, please review the tensor 
product basis theorem.) 


Example. Let’s compute the coordinates of the non-separable tensor 


1 0 
3 i 
G = @{2]) + @ [1 
—6 0 
T 1 
in the natural basis. We do it by applying the above formula to each separable 
component and adding, 


1 0 3 0 3 
3 | 2 1{1 6 1 i 
™ 1 30 1 1+37 
ia en 0 Weg a ig 6 
—6 | 2 ack =12 0 = 12 
T 1 —67 0 —67 


Tensors as Matrices 


If the two component spaces have dimension / and m, we know that the product space 
has dimension /m. More recently we’ve been talking about how these lm coordinates 


262 


might be organized. Separable or not, a tensor is completely determined by its Im 
preferred coefficients, which suggests a rectangle of numbers, 


Coo Cou Gos: Cont) 
C10 Cu Ci «22% <Giea a) 
Gio: Gai Gaia 82> 2a ancl) 


which happens to have a more organized structure in the separable case, 


Codo cody codg +++ Codm-1 
cdo cd, Cidg 28s sey 4 
Cardy Cardy Cysidg *** “Cpadiyi 


This matrix is not to be interpreted as a linear transformation of either component 
space — it is just a vector in the product space. (It does have a meaning as scalar- 
valued function, but we’ll leave that as a topic for courses in relativity, particle physics 
or structural engineering. ) 


Sometimes the Im column vector model serves tensor imagery best, while other 
times the | x m matrix model works better. It’s good to be ready to use either one 
as the situation demands. 


10.3. Linear Operators on the Tensor Product Space 


The product space is a vector space, and as such supports linear transformations. 
Everything we know about them applies here: they have matrix representations in 
the natural basis, some are unitary, some not, some are Hermitian, some not, some 
have inverses, some don’t, etc. The only special and new topic that confronts us is 
how the linear transformation of the two component spaces, V and W, inform those 
of the product space, V ® W. 


The situation will feel familiar. Just as tensors in V ® W fall into two main 
classes, those that are separable (products, v ® w, of two vectors) and those that 
aren’t (they are built from sums of separable tensors), the operators on V ® W have 
a corresponding breakdown. 


10.3.1 Separable Operators 


If A is a linear transformation (a.k.a. an “operator” ) on V 
A:V—->V, 
and B is a linear transformation on W 


B:Ww— WwW, 


263 


then their tensor product, A ® B is a linear transformation on the product space 
VEew, 


A®B:VeW — VewW, 
defined by its action on the separable tensors 
[A®Bl(v@w) = Av® Bw, 


and extended to general tensors linearly. 


Note 1: We could have defined A ® B first on just the lm basis vectors v;, ® w;, 
since they span the space. However, it’s so useful to remember that we can use this 
formula on any two component vectors v and w, that I prefer to make this the official 
definition. 


Note 2: A and/or B need not map their respective vector spaces into themselves. 
For example, perhaps A:V1> V’andB:WHW’. Then AQB: VQWH V'QW’. 
However, we will usually encounter the simpler case covered by the definition above. 


One must verify that this results in a linear transformation (operator) on the 
product space by proving that for an ¢,7 € V ® W and scalar c, 


[A®B)(¢ +n) = [A@Bl¢ + [A@B]n and 
[A@Bl(c¢) = cl[A@Bl¢. 
This is very easy to do as an ... 


[Exercise. Prove this by first verifiying it on separable tensors then showing that 
the extension to general tensors preserves the properties.| 


Example. Let A be defined on R? by 


Av = a(*) = Ce 
U1 U1 


and B be defined on R® by 


Bw = tw = Tw 
TW2 


On separable tensors, then, A ® B has the effect expressed by 


T Wo 
[A@B\(vaw) = (OT) @ a 
. TW2 


and this is extended linearly to general tensors. To get specific, we apply A & B to 
the tensor 


264 


to get 


[A@Bl¢ = j “ee on + Gat: 


T 
-2e() + Oe() 
T 1 
with no simplification obvious. However, an equivalent expression can be formed by 
extracting a factor of 7: 


[A@Bl¢é = « (<3) @ ; + (i)@ 1 


We can always forsake the separable components and instead express this as a column 
vector in the product space by adding the two separable tensors, 


_9 T 1 0 
[A@B\¢ = 2) @ [ar] + (;) @ (a 
2 
1 1 
—9n 0 —9n 
—187 1 —170 
| -9r? 1 7 7 — On? 
-6r | * lo} = _6n 
—127 0 —12n 
—6r? 0 —67? 


10.3.2 The Matrix of a Separable Operator 


We can write down the matrix for any separable operator using a method very similar 
to that used to produce the coordinate representation of a separable tensor product of 
vectors. As you recall, we used a V-major format which multiplied the entire vector 
w by each coordinate of v to build the result. The same will work here. We use an 
A-major format, i.e., the left operator’s coordinates change more “slowly” than the 
right operator’s. Stated differently, we multiply each element of the left matrix by 


265 


the entire matrix on the right. 


ao0 G01 *** Go(i—1) boo bor -°: Do(m—1) 
G10 G11 *** G@1(i-1) | ® Dio Bi 82 51 (m1) 
boo 01 ++ boo 201 
~ boo bor °°° boo b01 
a10 bio bit as ay bio bi 


This works based on a V-major column format for the vectors in the product space. 
If we had used a W-major column format, then we would have had to define the 
product matrix using a B-major rule rather than the A-major rule given above. 


Example. The matrices for the A and B of our last example are given by 
xr 0 0 
A. = ¢ i) and B = Os 45. OTs, 
OL ae Sar 


so the tensor product transformation A ® B is immediately written down as 


zr O O 2r 0 O 

0 «z 0 0 2 O 

0 Of 0 O 27 
A®B = 

0 0 0 cr QO O 

0 0 0 0 a« 0 


ro) 
oO 
Oo 
Oo 
S 
4 


Example. We’ve already applied the definition of a tensor operator directly to 
compute the above operator applied to the tensor 


3 
7 
1 0 
3 1 1+37 
«~ (oC) + Oe) - 
—12 
—67 


266 


and found that, after reducing the result to a column vector, 


—9r 
—171 
7 —9n? 
—6r 
—127 
—6n? 


[A@B\¢ = 


It now behooves us to do a sanity check. We must confirm that multiplication of ¢ 
by the imputed matrix for A ® B will produce the same result. 


zt O O 2x 0 O 3 
0 «z 0 0 27 O 7 
0 0 ft 0 O 2 
[A@BI¢ = es 4 
0 0 0 nmr OO O 19 
0 0 0 0 a« 0 “GR 
0 0 0 0 O Ff 


[Exercise. Do the matrix multiplication, fill in the question mark, and see if it 
agrees with the column tensor above.| 


10.3.3. The Matrix of a General Operator 


As with the vectors in the product space (i.e., tensors), we can represent any operator 
on the product space as sum of no more than (Im)? separable operators, since there 
are that many elements in a tensor product matrix. The following exercise will make 
this clear. 


[Exercise. Let the dimensions of our two component spaces be | and m. Then 
pick two integers p and q in the range 0 < p,q < lm. Show that the matrix 


im 


Pq 


which has a 1 in position (p,q) and 0 in all other positions, is separable. Hint: You 
need to find an / x / matrix and an m xX m matrix whose tensor product has 1 in the 
right position and Os everywhere else. Start by partitioning P,, into sub-matrices of 
size m x m. Which sub-matrix does the lonely 1 fall into? Where in that m x m 
sub-matrix does that lonely 1 fall? | 


[Exercise. Show that the set of all (/m)* matrices 


{ p, 


“spans” the set of linear transformations on V @ W.| 


0<pa<imh, 


267 


10.3.4 Food for Thought 


Before we move on to multi-qubit systems, here are a few more things for you may 
wish to ponder. 
[Exercise. Is the product of unitary operators unitary in the product space?| 


[Exercise. Is the product of Hermitian operators Hermitian in the product 


space? | 
(Exercise. Is the product of invertible operators invertible in the product space?| 


268 


Chapter 11 


Two Qubits and Binary Quantum 
Gates 


Iw)” = a0) 4]0)—p + B|0)4|1)z 
FL) |Oe HO |l) als 


11.1 The Jump from One to Two 


Classical Binary Systems 


Elevating a classical one-bit system to a classical two-bit system doesn’t seem to 
require any fancy math. We simply slap together two single bits and start defining 
binary logic gates like AND and OR which take both bits as input. That works 
because classical bits cannot become entangled the way in which quantum bits can. 
Nevertheless, we could use tensor products to define the classical two-bit system if 
we were inclined to see the formal definition. As with the one-bit systems, it would 
be rather dull and its only purpose would be to allow a fair comparison between two 
classical bits and two quantum bits (the latter requiring the tensor product). I won’t 
bother with the formal treatment of two classical bits, but leave that as an exercise 
after you have seen how we do it in the quantum case. 


Quantum Binary Systems 


Tensor algebra allows us to make the leap from the 2-D Hilbert space of one qubit to 
a 4-D Hilbert space of two qubits. Once mastered, advancing to n qubits for n > 2 
is straightforward. Therefore, we move carefully through the n = 2 case, where the 
concepts needed for higher dimensions are easiest to grasp. 


269 


11.2 The State Space for Two Qubits 


11.2.1 Definition of a Two Quantum Bit (“Bipartite”) Sys- 
tem 


The reason that we “go tensor” for a two-qubit system is that the two bits may 
become entangled (to be defined below). That forces us to treat two bits as if they 
were a single state of a larger state space rather than keep them separate. 


Definition of Two Qubits. A Two-qubit system is (any copy of) the 
entire product space H ® H. 


Definition of a Two-Qubit Value. The “value” or “state” of a two- 
qubit system is any unit (or normalized) vector in H ®H. 


In other words, two qubits form a single entity — the tensor product space H ® H 
— whose value can be any vector (which happens also to be a tensor) on the projective 
sphere of that product space. 


The two-qubit entity itself is not committed to any particular value until we say 
which specific unit-vector in H ® H we are assigning it. 


Vocabulary and Notation 


Two qubits are often referred to as a bipartite system. This term is inherited from 
physics in which a composite system of two identical particles (thus bi-parti-te) can 
be “entangled.” 


To distinguish the two otherwise identical component Hilbert spaces I may use 
subscripts, A for the left-space and B for the right space, 


Hsz@® He. 


Another notation you might see emphasizes the order of the tensor product, that is, 
the number of component spaces — in our current case, two, 


H2) : 


In this lesson, we are concerned with order-2 products, with a brief but important 
section on order-3 products at the very end. 


Finally, note that in the lesson on tensor products, we used the common abstract 
names V and W for our two component spaces. In quantum computation the com- 
ponent spaces are usually called A and B. For example, whereas in that lecture I 
talked about “V-major ordering” for the tensor coordinates, I’Il now refer to “A-major 
ordering” . 


270 


11.2.2 The Preferred Bipartite CBS 


First and foremost, we need to establish symbolism for the computation basis states 
(CBS) of our product space. These states correspond to the two-bits of classical 
computing, and they allow us to think of two ordinary bits as being embedded within 
the rich continuum of a quantum bipartite state space. 


Symbols for Basis Vectors 


The tensor product of two 2-D vector spaces has dimension 2 x 2 = 4. Its inherited 
preferred basis vectors are the separable products of the component space vectors, 


{|0) @|0) , ]0)@|1), [1)@ 10), [1)@l)}. 


These are the CBS of the bipartite system. There are some shorthand alternatives in 
quantum computing. 


2 


0)@|0) «> |0)|O) <-> |00) <-> 0) 
O)@|1) <> jO)]1) <4 |ol) <4 [19 
1)@]0) «+ |1)j0) «> |10) <4 |2)? 
Well) qo jl) eo fll) 4 |3) 


All three of the alternatives that lack the ® symbol are seen frequently in computer 
science, and we will switch between them freely based on the emphasis that the 
context requires. 


The notation of the first two columns admits the possibility of labeling each of 
the component kets with the H from whence it came, A or B, as in 


[0) 4 @|0)—, +> |0),4|0)p, 

0)4@|1)—5 > [0)4|1)p 

etc. 
I will often omit the subscripts A and B when the context is clear and include them 
when I want to emphasize which of the two component spaces the vectors comes from. 
The labels are always expendable since the A-space ket is the one on the left and the 


B-space ket is the one on the right. I will even include and/or omit them in the same 
string of equalities, since it may be clear in certain expressions, but less so in others: 


U(B|0)|1) + 4]1)|1)) = BI0)|0) + 60) |1) 
= 0) , (8/0), e 5|1),) 


(This is an equation we will develop later in this lesson.) 


U( (B10) + 5114) Mo ) 


Ze 


The densest of the notations in the “¢+—>” stack a couple paragraphs back is 
the encoded version which expresses the ket as an integer from 0 to 3. We should 
reinforce this correspondence at the outset and add the coordinate representation of 
each basis ket under the implied A-major ordering of the vectors suggested by their 
presentation, above. 


basis ket |0) |0) |O) |1) |1) |0) \1) |1) 
encoded |0)? |1)? |2)? |3)? 
1 0 0 0 
; 0 1 0 0 
coordinates 
0 0 1 0 
0 0 0 1 


This table introduces an exponent-like notation, |)”, which is needed mainly in the 
encoded form, since an integer representation for a CBS does not disclose its tensor 
order (2 in this case) to the reader, while the other representations clearly imply that 
we are looking at two-qubits. 


11.2.3. Separable Bipartite States 


The CBS are four special separable tensors. There is an infinity of other separable 
tensors in H ® H of the form 


IY) @lp) <> |b) |p) - 


Note that, unlike the CBS symbolism, there is no further alternative notation for a 
general separable tensor. In particular, |¢y) makes no sense. 


11.2.4 Alternate Bipartite Bases 
Inherited Second Order z-CBS 


We can construct other computational bases like the inherited x-basis, 


{10)2 192 Dele: eles Mele ts 


or, using the common alternate notation the x-basis, as 


ig aay ela a ie ae ce a ee eee 
You might see it condensed, as 


Lala, ey eae es 


22 


or super condensed (my own notation), 


2 2 2 2 
10a Dag 25°18) fs 

In this last version, I’m using the subscript + to indicate “x basis” and I’m encoding 

the ket labels into decimal integers, 0 through 3, for the four CBS states. 


Inherited Second Order Mixed CBS 


While rare, we could inherit from the z-basis for our A-space, and the x-basis for our 
B-space, to create the hybrid 


{10) 10). + 10) [Ye 11) 102+ [at - 


[Exercise. How do we know that this is an orthonormal basis for the product space’| 


Notation 


Whenever we choose to label a state as a CBS of a non-standard basis, we leave no 
room to label the component spaces, |0) , |1) ,, which then must be inferred from their 
position (A left and B right). 


[Exception. If we use the alternate notation |+) = |0),, and |—) = |1),,, then we 
once again have room for the labels A and B: |+),|+),, |—)4|+)p,, etc.] 


No matter what basis we use, if pressed to show both the individual basis of interest 
and label the component state spaces, we could express everything in the z-basis and 
thus avoid the subscript conflict, entirely, as with 


OI). = (O4( ea Ma) 


The basis we use as input(s) to the quantum circuit — and along which we measure 
the output of the same circuit — is what we mean when we speak of the computational 
basis. By convention, this is the z-basis. It corresponds to the classical bits 


0) |0) < [00 
O)|1) <4 [01 
1)|0) «+ [10 
hit eS [4 


Nevertheless, in principle there is nothing preventing us from having hardware that 
measures output along a different orthonormal basis. 


[Future Note. In fact, in the later courses, CS 83B and 83C, we will be consid- 
ering measurements which are not only made with reference to a different observable 
(than S) but are not even bases: they are neither orthonormal nor linearly inde- 
pendent. These go by the names general measurement operators or positive operator 
valued measures, and are the theoretical foundation for encryption and error correc- 
tion. ] 


273 


Example 


We expand the separable state 


0). Dy 


along the (usual) computational basis in H @ H. 


(Os) (" —) 
_ |00) — éJo1) + |10) — #21) 


2 


As a sanity check we compute the modulus-squared of the product directly, i.e., by 
summing the magnitude-squared of the four amplitudes, 


Jom = @)G)* GG) + @)@)* @) 


= 1, 


0}. My 


as it darn well had better be. V 


This is the kind of test you can perform in the midst of a large computation to 
be sure you haven’t made an arithmetic error. 


We might have computed the modulus-squared of the tensor product using one 
of two other techniques, and it won’t hurt to confirm that we get the same 1 as an 
answer. The techniques are 


e the definition of inner product in the tensor space (the product of component 
inner-products), 


io.) = ( 1).e1), | I, @I), ) 
HOO ly. et ddr 0 ae 


e or the adjoint conversion rules to form the left bra for the inner product, 


© = (yle(01) (10.1,) = yl (2(010),) 1, 
(1]1) ee: TM tye as 
y Y 


0). 11), 


However, neither of these would have been as thorough a check of our arithmetic as 
the first approach. 


[Exercise. Expand the separable state 
(v0) + iV9|1)) @ (iv-7 (0) + v3|1)) 


along the computational basis in H ® H. Confirm that it is a unit tensor. | 


274 


How the Second Order a-Basis Looks when Expressed in the Natural Basis 
If we combine the separable expression of the x-CBS kets, 

Peas. Eps Spee Jamd? |=) |): 
with the expansion of each component ket along the natural basis, 


_ _ +h 
HH) = Hi} = 5 d 
= = MD 


the four x-kets look like this, when expanded along the natural basis: 


sees en IBID 8 oe 1) JO) + 13) (a) 

0) |0) — |0)|1) + |1)|0) — [1)|1) 

tie = ; 

SNe ae ee 

= OHO) =: 10) = [D0 |) 
Seeds ; 


Notice something here that we will use in a future lecture: 


When expanded along the z-basis, the x-basis kets have equal numbers of 
+ and — terms except for the zeroth CBS ket, |00)., whose coordinates 
are all +1. 


I know it sounds silly when you say it out loud, but believe me, it will be very useful. 


11.2.5 Non-Separable Bipartite Tensors 


The typical tensor in a two-qubit system is a normalized finite sum of separable 
tensors — that’s how we started defining product tensors. While such a tensor is not 
necessarily separable, it can at least be expressed as a superposition of the four CBS 
kets, 


wb)? = a0) |0) + BO) |1) + y|1)]0) + 5/1) |1). 
The “exponent 2” on the LHS is, as mentioned earlier, a clue to the reader that |~) 


lives in a second-order tensor product space, a detail that might not be clear without 
looking at the RHS. In particular, nothing is being “squared.” 


219 


11.2.6 Usual Definition of Two Qubits and their Values 


This brings us to the common definition of a two-qubit system, which avoids the 
above formalism. 


Alternative Definition of Two Qubits. “Two qubits” are repre- 
sented by a variable superposition of the four tensor basis vectors of H@H, 


by” = @|0)|0) + B|0)\1) + yl1)|0) + 6]1)[1) 
where the complex scalars satisfy 
a Bl a gl? ae lel Se ee 
Using our alternate notation, we can write this superposition either as 
Iw)? = al00) + Bl01) + y|10) + 6/11) 
or 
Wb)? = ao)’ + Bll)’ + 7/2)” + 6]3)°. 


We may also use alternate notation for scalars, especially when we prepare for higher- 
order product spaces: 


|b)? = 0/00) + coi|01) + cio {10) + cy {11) 
or 


WW)? = em |O)* + cL)? + c|2)? + ©3|3)?. 


11.3. Fundamental Two-Qubit Logic Gates 


11.3.1 Binary Quantum Operators 
Definition and Terminology 


A binary quantum operator is a unitary transformation, U, on the 
two-qubit system H @ H. 


As you can see, with all the vocabulary and skills we have mastered, definitions 
can be very short now and still have significant content. For instance, we already 
know that some binary quantum operators will be separable and others will not. The 
simplest and most common gate, in fact, is not separable. 


Binary quantum operators also go by the names two-qubit gates, binary qubit 
operators, bipartite operators and various combinations of these. 


276 


Complete Description of Binary Quantum Operators 


When we study specific binary qubit gates, we will do five things: 


t : Show the symbol for that gate, 
|a) |y) : define the gate on the computation basis states, 
(Gees : construct the matrix for the gate, 


\“b)? : examine the behavior of the gate on an general state (i.e., one that is not 
necessarily separable), and 


\ : discuss the measurement probabilities of the output registers. 


11.3.2 General Learning Example 

We'll go through the checklist on a not-particularly-practical gate, but one that has 
the generality needed to cover future gates. 

} : The Symbol 


Every binary qubit gate has two input lines, one for each input qubit, and two output 
lines, one for each output qubit. The label for the unitary transformation associated 
with the gate, say U, is placed inside a box connected to its inputs and outputs. 


U 


Although the data going into the two input lines can become ” entangled” inside the 
gate, we consider the top half of the gate to be a separate register from the lower 
half. This can be confusing to new students, as we can’t usually consider each output 
line to be independent of its partner the way the picture suggests. More (a lot more) 
about this shortly. 


Vocabulary. The top input/output lines form an upper A register (or A channel) 
while the bottom form a lower B register (or B channel). 


A-register in A-register out 
U 
B-register in B-register out 


The labels A and B are usually implied, not written. 


EE 


|z) |y) : Action on the CBS 


Every operator is defined by its action on the basis, and in our case that’s the com- 
putational basis. For binary gates, the symbolism for the general CBS is 


Iz) ®@|y) = |x) ly), 
where x, y € {0,1}. 


To demonstrate this on our “learning” gate, U, we define its action on the CBS, which 
in turn defines the gate: 


|x) | =y) 
U 
ly) | =z © y) 


It is very important to treat the LHS as a single two-qubit input state, not two 
separate single qubits, and likewise with the output. In other words, it is really 
saying 


U(|z)@ly)) = |-y) @|-rey) 
or, using shorter notation, 
U(|x)|y)) = |-7y) [72 ey) 


Furthermore, |x) |y) only represents the four CBS, so we have to extend this linearly 
to the entire Hilbert space. 


Let’s make this concrete. Taking one of the four CBS, say |10), the above definition 
tells us to substitute 1 > x and 0 > y, to get the gate’s output, 


U(|1)|0)) = |-0) |>1@0) = 1) |O). 
[Exercise. Compute the effect of U on the other three CBS.| 
(..:) : The Matrix 


In our linear transformation lesson, we proved that the matrix M; that represents 
an operator JT can be written by applying T to each basis vector, a,, and placing the 
answer vectors in the columns of the matrix, 


Mr = [re ’ T (az) a eR * 5 re) ; 
T(v) is then just Mrp-v. Applying the technique to U and the CBS {|z) |y)} we get 


My = (vi, U ol) , U|10) , vin) ). 


208 


Each of these columns must be turned into the coordinate representation — in the 
inherited tensor basis — of the four U-values. Let’s compute them. (Spoiler alert: this 
was the last exercise): 


0 
0 
U\00) = |70) |-0@0) = |1) |1@0) = |1){1) = ol}: 
1 
Similarly, we get 
1 
0 
U|01) = >I) |-0@1) = 0) |161) = [0)/0) = |g]. 
0 
0 
0 
U 10) = [> 0) |= 1@0) = [1 OG 0) = 1p Oy = | |. 
0 
0 
1 
U\l1) = |nl)|71@1) = |0) (081) = (0)|1) = ol: 
0 
giving us the matrix 
0 10 0 
ed 7 0 60r 08 
SS = Ne a ye 
iE 6 8 


which is, indeed, unitary. Incidentally, not every recipe you might conjure for the four 
values U (|x) |y)) will produce a unitary matrix and therefore not yield a reversible — 
and thus valid — quantum gate. (We learned last time that non-unitary matrices do 
not keep state vectors on the projective sphere and therefore do not correspond to 
physically sensible quantum operations.) 


[Exercise. Go through the same steps on the putative operator defined by 


U(|z)|ly)) = |e@ -y) |7e ey). 


Is this matrix unitary? Does U constitute a realizable quantum gate?| 


|b)? : Behavior on General State 


The general state is a superposition of the four basis kets, 


eb)” = a0) 0) + BI0)|1) + yf1)]0) + 611) |1), 


209 


whose coordinates are 


a 
2 B 
I) Ale 
) 
so matrix multiplication gives 

0 10 0 a B 
a 000 1 B - 6 
ely 00 1 0 y > a 
100 0 ) a 


= B|0)|0) + 4]0)|1) + [1)|0) + a1) {1) . 


Evidently, U leaves the amplitude of the CBS ket, |1) |0), alone and permutes the 
other three amplitudes. 


\, : Measurement 


[Note: Everything in this section, and in general when we speak of measurement, 
assumes that we are measuring the states relative to the natural computational basis, 
the z-basis, unless explicitly stated otherwise. This is { |), |1) } for a single register 
and {|00), |01), |10), |11) } for both registers. We can measure relative to other 
CBSs, but then some of the results below cannot be applied exactly as stated. Of 
course, I would not finish the day without giving you an example.| 

Now that we know what U does to input states we can make some basic ob- 
servations about how it changes the measurement probabilities. Consider a state’s 
amplitudes both before and after the application of the gate (access points P and Q, 


respectively): 
Ib)’ > 


Easy Observations by Looking at Expansion Coefficients. Trait #6 (QM’s 
fourth postulate) tells us that a measurement of the input state (point P) 


by” = a0) 0) + B10)|1) + yf1)]0) + 6|1)|1), 


collapses it to |00) with probability |a|?. Meanwhile, a look at the U |w)*’s amplitudes 
(point Q), 


U py? 


aes | as es A ge a) Same 


Up)” = 6|0)|0) + 6|0)|1) + y|1)]0) + a1) |1), 


280 


reveals that measuring the output there will land it on the state |00) with probability 
|3|?. This was the input’s probability of landing on |01) prior to the gate; U has 
shifted the probability that a ket will register a “01” on our meter to the probability 
that it will register a “O00.” In contrast, a glance at |y)’s pre- and post-U amplitudes 
of the CBS |10) tells us that the probability of this state being measured after U is 
the same as before: |+¥|?. 

Measurement of Separable Output States. By looking at the expansion 
coefficients of the general output state, we can usually concoct a simple input state 
that produces a separable output. For example, taking y = a = 0 gives a separable 


input, (3 |), + 6 I1)4) ® |1),, as well as the following separable output: 


U( (Bl), + 44) Ide) = U(Blo)|1) + 411) |1)) = Bloyloy + 5011) 
a 0) 4 (8 10) 5 oh 5|1),) 


We consider a post-gate measurement at access point Q: 


B\0) + 6/1) 


[1) 


[0) 


B\0) + 6|1) 


| 
| 
| 
: 
Y 
Q 


Measuring the A-register at the output (point Q) will yield a “0” with certainty 
(the coefficient of the separable CBS component, |0), is 1) yet will tell us nothing 
about the B-register, which has a |6|? probability of yielding a “0” and |é|? chance of 
yielding a “1”, just as it did before we measured A. Similarly, measuring B at point 
Q will collapse that output register into one of the two B-space CBS states (with 
the probabilities |3|? and |6|?) but will not change a subsequent measurement of the 
A-register output, still certain to show us a “QO”. 


(If this seems as though I’m jumping to conclusions, it will be explained formally 
when we get the Born rule, below.) 


A slightly less trivial separable output state results from the input, 


/2 V6 V2 V6 
(2) |00) + (4) |01) + (2) [10) + (<4) [11) 
— (104+ Wa 0)5 + V3 (1), 

- (Maza) 0 ( 2 ) 


(As it happens, this input state is separable, but that’s not required to produce a 
separable output state, the topic of this example. I just made it so to add a little 
symmetry. ) 


wy? 


281 


The output state can be written down instantly by permuting the amplitudes 
according to U’s formula, 


Ul)? = (#) |00) + () |O1) + (4) [10) + () |11) 


= V3 \0) 4 3 ll) 4 0) p 23 ll) 5 
= ( 5 & (Base , 


and I have factored it for you, demonstrating the output state’s separability. Mea- 
suring either output register at access point Q, 


}0) + |1) v3 |0) + |1) 
V2 2 
|0) + V3 |1) }0) + [1) 


9 V2 


has a non-zero probability of yielding one of the two CBS states for its respective H, 
but it won’t affect the measurement of the other output register. For example, mea- 
suring the B-qubit-out will land it in it |0) , or |1),, with equal probability. Regardless 
of which result we get, it will not affect a future measurement of the A-qubit-out which 
has a 3/4 chance of measuring “0” and 1/4 chance of showing us a “1.” 


Measuring One Register of a Separable State. This is characteristic of 
separable states, whether they be input or output. Measuring either register does not 
affect the probabilities of the other register. It only collapses the component vector of 
the tensor, leaving the other vector un-collapsed. 

11.3.3. Quantum Entanglement 
The Measurement of Non-Separable Output States 


As a final example before we get into real gates, we look at what happens when we try 
to measure a non-separable output state of our general learning circuit just presented. 
Consider the input, 


1) = (Sp) 100) + (3) 


i, (ee Be) 


282 


a separable state that we also know under the alias, 


[0) 0). 


Applying U, this time using matrix notation (for fun and practice), yields 


010 0 1 1 
2» {0001 1/1] — 1 fo 
BW => Nona ag. || po 2 | 0 
1. Oe 0s. °0 0 1 


2410» + Wal, 
V2 
clearly not factorable. Furthermore, unlike the separable output states we have stud- 
ied, a measurement of either register forces its partner to collapse. 


Q 


For example, if we measure B’s output, and find it to be in state |1),, since the 
output ket has only one CBS tensor associated with that |1),, namely |1),|1),, as 
we can see from its form 


}0) 0) + |1)|4) 
V2 
we are forced to conclude that the A-register must have collapsed into its |1) state. If 
this is not clear to you, imagine that the A-register had not collapsed to |1). It would 
then be possible to measure a “Q” in the A-register. However, such a turn of events 


would have landed a |1) in the B-register and |0) in the A-register, a combination 
that is patently absent from the output ket’s CBS expansion, above. 


Stated another way (if you are still unsure), there is only one bipartite state here, 
and if, when expanded along the CBS basis, one of the four CBS kets is missing 
from that expansion that CBS ket has a zero probability of being the result of a 
measurement collapse. Since |0) |1) is not in the expansion, this state is not accessible 
through a measurement. (And by the way, the same goes for |1) |0).) 


Definition of Quantum Entanglement 


An entangled state in a product space is one that is not separable. 


283 


Non-Locality 


Entangled states are also said to be non-local, meaning that if you are in a room 
with only one of the two registers, you do not have full control over what happens 
to the data there; an observer of the other register in a different room may measure 
his qubit and affect your data even though you have done nothing. Furthermore, if 
you measure the data in that register, your efforts are not confined to your room but 
extend to the outside world where the other register is located. Likewise, separable 
states are considered local, since they do allow full segregation of the actions on 
separate registers. Each observer has total control of the destiny of his register, and 
his actions don’t affect the other observer. 


The Entanglement Connection 


Non-separable states are composed of entangled constituents. While each constituent 
may be physically separated in space and time from its partner in the other register, 
the two parts do not have independent world lines. Whatever happens to one affects 
the other. 


This is the single most important and widely used phenomenon in quantum com- 
puting, so be sure to digest it well. 


Partial Collapse 


In this last example a measurement and collapse of one register completely determined 
the full and unique state of the output. However, often things are subtler. Measuring 
one register may have the effect of only partially collapsing its partner. We'll get to 
that when we take up the Born rule. 


Measurements Using a Different CBS 


Everything we’ve done in the last two sections is true as long as we have a consistent 
CBS from start to finish. The definition of U, its matrix and the measurements have 
all used the same CBS. But funny things happen if we use a different measurement 
basis than the one used to define the operator or express its matrix. Look for an 
example in a few minutes. 


Now let’s do everything again, this time for a famous gate. 


11.3.4 The Controlled-NOT (CNOT) Gate 


This is the simplest and most commonly used binary gate in quantum computing. 


284 


} : The Symbol 


The gate is usually drawn without an enclosing box as in 


2 


but I sometimes box it, 


D 


The A-register is often called the control bit, and the B-register the target bit, 


“control bit” —> ° 
“target bit” —> BD 


We'll explain that terminology in the next bullet. 


|x) |y) : Action on the CBS 


The CNOT gate has the following effect on the computational basis states: 


|x) i |) 


ly) +-@-—- |2 Oy) 


When viewed on the tiny set of four CBS tensors, it appears to leave the A-register 
unchanged and to negate the B-register qubit or leave it alone, based on whether the 
A-register is |1) or |0): 


|=y) , ifx=1 


Because of this, the gate is described as a controlled-NOT operator. The A-register is 
called the control bit or control register, and the B-register is the target bit or target 
register. We cannot use this simplistic description on a general state, however, as the 
next sections will demonstrate. 


285 


ee) : The Matrix 


We compute the column vectors of the matrix by applying CNOT to the CBS tensors 
to get 


Menor = (cxor \00) , CNOT|01) , CNOT]10) , exorn) ) 
1 0 0 0 
0 1 0 0 
0 0 1 0 


which, as always, we must recognize as unitary. 


(Exercise. Prove that this is not a separable operator. Hint: What did we say 
about the form of a separable operator’s matrix?| 


|b)? : Behavior on General State 


Applying CNOT to the general state, 


I)? = a@|O)|0) + BIOy|1) + yl1)|0) + 6|1p |), 
we get 
100 0\ fa a 
corm = [S388] [4] = [i 
001 0) \6 y 
= a|0)|0) + B|0)|1) + 6]1)|0) + y]1)]1). 


A Meditation. If you are tempted to read on, feeling that you understand 
everything we just covered, see how quickly you can answer this: 


[Exercise. The CNOT is said to leave the source register unchanged and flip 
the target register only if the source register input is |1). Yet the matrix for CNOT 
seems to always swap the last two amplitudes, y © 6, of any ket. Explain this.] 


Caution. If you cannot do the last exercise, you should not continue reading, but 
review the last few sections or ask a colleague for assistance until you see the light. 
This is an important consequence of what we just covered. It is best that you apply 
that knowledge to solve it rather than my blurting out the answer for you. 


286 


\, : Measurement 


First, we’ll consider the amplitudes before and after the application of CNOT (access 
points P and Q, respectively): 


| | 

| | 

5 I ® I 
2 
1) +{ weECE. CNOT |) 

] fod ] 

v v 

P Q 

A measurement of the input state (point P), 

Ib)” = a0) 0) + B10) |1) + y{1) fo) + 61) |2), 


will yield a “00” with probability |a|? and “01” with probability |6|?. A post-gate 
measurement of CNOT |w)” (point Q), 


CNOT |b)" = a|0)|0) + BI0)|1) + 4|1)|0) + y{2)|1), 


will yield those first two readings with the same probabilities since their ket’s respec- 
tive amplitudes are not changed by the gate. However, the probabilities of getting a 
“10” vs. a “11” reading are swapped. They go from |y|? and |6|? before the gate to 
|o|? and |y|?, after. 


Separable Output States. There’s nothing new to say here, as we have covered 
all such states in our learning example. Whenever we have a separable output state, 
measuring one register has no affect on the other register. So while a measurement 
of A causes it to collapse, B will continue to be in a superposition state until we 
measure it (and vice versa). 


Quantum Entanglement for CNOT 


A separable bipartite state into CNOT gate does not usually result in a separable 
state out of CNOT. To see this, consider the separable state 


pw? = Io), @ fo) = (PEE) io 


going into CNOT: 


287 


When presented with a superposition state into either the A or B register, back 
away very slowly from your circuit diagram. Turn, instead, to the linear algebra, 
which never lies. The separable state should be resolved to its tensor basis form by 
distributing the product over the sums, 


}0) + |1) as wt lis 
(Co) 0) = Selo) + =e ht0) 


[Exception: If you have a separable operator as well as separable input state, we 
don’t need to expand the input state along the CBS, as the definition of separable 
operator allows us to apply the component operators individually to the component 
vectors. CNOT is not separable, so we have to expand.| 


Now, apply CNOT using linearity, 


1 1 1 1 
CNOT aw | 550) ) = ,CNOT([00)) + T-CNOT ([10)) 
as: M00), =a 
V2 V2 
00) + |11) 


V2 


This is the true output of the gate for the presented input. It is not separable as is 
obvious by its simplicity; there are only two ways we might factor it: pulling out an 
A-ket (a vector in the first H space) or pulling out a B-ket (a vector in the second H 
space), and neither works. 


(Repeat of) Definition. An entangled state in a product space is 
one that is not separable. 


Getting back to the circuit diagram, we see there is nothing whatsoever we can 
place in the question marks that would make that circuit sensible. Anything we 
might try would make it appear as though we had a separable product on the RHS, 
which we do not. The best we can do is consolidate the RHS of the gate into a szngle 
bipartite qubit, indeed, an entangled state. 


BEE Ae }00) + |11) 
V2 Ngee at ae 
joy +6 ry) 


With an entangled output state such as this, measuring one output register causes the 
collapse of both registers. We use this property frequently when designing quantum 
algorithms. 


Individual Measurement of Output Registers. Although we may have an 
entangled state at the output of a gate, we are always allowed to measure each 


288 


register separately. No one can stop us from doing so; the two registers are distinct 
physical entities at separate locations in the computer (or universe). Entanglement 
and non-locality mean that the registers are connected to one another. Our intent to 
measure one register must be accompanied by the awareness that, when dealing with 
an entangled state, doing so will affect the other register’s data. 


When CNOT?’s Control Register gets a CBS ... 
Now, consider the separable state 


WY = }e, = 1 (Oo) 


J2 
going into CNOT: 


1) et » 
+n [tf , 
“fd WD 


I have chosen the A-register input to be |1) for variety. It could have been |0) with 
the same (as of yet, undisclosed) outcome. The point is that this time our A-register 
is a CBS while the B-register is a superposition. We know from experience to ignore 
the circuit diagram and turn to the linear algebra. 


oe 1 1 
(ees re ee. ay 2 
1» ( V2 J. vo 
and we apply CNOT 

1 


1 1 
mer 75!) -— 


1 
V2 
) 


enor ( CNOT(|10)) + —=CNOT ({11)) 


ies 


2 At iG 
V2 V2 

_ 1) + |0) 
ne (a) | 


Aha — separable. That’s because the control-bit (the A-register) is a CBS; it does not 
change during the linear application of CNOT so will be conveniently available for 
factoring at the end. Therefore, for this input, we are authorized to label the output 
registers, individually. 


[1) , [1) 
0) + |1) 
v2 ms V2 


289 


The two-qubit output state is unchanged. Not so fast. You have to do an ... 


[Exercise. We are told that a |1) going into CNOT’s control register means 
we flip the B-register bit. Yet, the output state of this binary gate is the same as 
the input state. Explain. Hint: Try the same example with a B-register input of 
Ce ae 

[Exercise. Compute CNOT of an input tensor |1)|1),.. Does CNOT leave this 
state unchanged?| 


Summary. A CBS ket going into the control register (A) of a CNOT gate allows 
us to preserve the two registers at the output: we do, indeed, get a separable state 
out, with the control register output identical to the control register input. This is 
true even if a superposition goes into the target register (B). If a superposition goes 
into the control register, however, all bets are off (i.e., entanglement emerges at the 
output). 


What the Phrase “The A-Register is Unchanged” Really Means 


The A, or control, register of the CNOT gate is said to be unaffected by the CNOT 
gate, although this is overstating the case; it gives the false impression that a separable 
bipartite state into CNOT results in a separable state out, which we see is not the 
case. Yet, there are are at least two ways to interpret this characterization. 


1. When a CBS state (of the preferred, z-basis) is presented to CNOT’s A-register, 
the output state is, indeed, separable, with the A-register unchanged. 


2. If a non-trivial superposition state is presented to CNOT’s A-register, the mea- 
surement probabilities (relative to the z-basis) of the A-register output are pre- 
served. 


We have already demonstrated item 1, so let’s look now at item 2. The general 
state, expressed along the natural basis is 


I)? = a@|0)|0) + BlOyf1) + yl1)|0) + 6a fa), 
and CNOTing this state gives 
CNOT |p)? = a0) |0) + |O)|1) + 5]1)|0) + y{1){1). 


If we were to measure this state, we know that it would collapse to a CBS in H@H. 
With what probability would we find that the first CBS (register-A) in this state had 
collapsed to |0)? |1)? 


We have not had our formal lesson on probability yet, but you can no doubt 
appreciate that, of the four possible CBS outcomes, two of them find the A-reg 
collapsing to |0) , (even though they are independent events because the B-register of 


290 


one outcome is |0),, and of the other outcome is |1),). Therefore, to get the overall 
probability that A collapses to |0) ,, we simply add those two probabilities: 


P(A-reg output \, |0)) 


p(cnor Ib) \y 0) ) + p(cxor Ib) \y 1) ) 
= lal’? + |e), 


which is exactly the same probability of measuring a 0 on the input, |W), prior to 
applying CNOT. 

Exercise. We did not compute the probability of measuring a “0” on the input, 
|W). Do that that to confirm the claim.| 


Exercise. What trait of QM (and postulate) tells us that the individual proba- 
bilities are |a|? and |8|°?] 

Exercise. Compute the probabilities of measuring a “7” in the A-register both 
before and after the CNOT gate. Caution: This doesn’t mean that we would measure 
the same prepared state before and after CNOT. Due to the collapse of the state after 
any measurement, we must prepare many identical states and measure some before 
and others after then examine the outcome frequencies to see how the experimental 
probabilities compare.| 


Measuring Using a Different CBS 


Now we come to the example that I promised: measuring in a basis different from the 
one used to define and express the matrix for the gate. Let’s present the following 
four bipartite states 


00), , |01), , |10),., and |11), 


to the input of CNOT and look at the output (do a measurement) in terms of the 
x-basis (which consist of those four tensors). 


|00),. : This one is easy because the z-coordinates are all the same. 


100), = 0),|0), = (Main) (in) = 


CNOT applied to |00),, swaps the last two z-coordinates, which are identical, 
so it is unchanged. 


eee 


CNOT 00), = 00), 


(Use any method you learned to confirm this rigorously.) 


291 


|01),, : Just repeat, but watch for signs. 


01), = 10,1), = (oe) (min) Pee 


Let’s use the matrix to get a quick result. 


i el 
om 7 SO Nisan 
= i 


CNOT|01), = 


oo Cor 
oor co 
Ee oO @® 
oOroe 

_ 

| 

ae 


From here it’s easier to expand along the z-basis so we can factor, 


CNOT|O1), = 5(0)l0) — [o)|1) — [2)lo) + fayfn)) 
= 5 (0) 0) = |1)}0) — Jo) [1) + (1) 1)) 
= $( (1) - 1) 1) - (Io - 1)) 1) 
= 5( (10) = Oe 11)) ) 


(M- wy (2) =. pitts, ey: 


What is this? Looking at it in terms of the x-basis it left the B-register un- 
changed at |1),, but flipped the A-register from |0),, to |1).,. 


Looking back at the |00), case, we see that when the B-register held a qubit in 
the S, =“0” state, the A-register was unaffected relative to the x-basis. 


This looks suspiciously as though the B-register is now the control bit and A is 
the target bit and, in fact, a computation of the remaining two cases, |10),, and 
|11),,, would bear this out. 


|10), : CNOT |10), 


|10),, [Exercise.] 


|11), : CNOT |11), = |01), [Exercise.] 


Summarizing, we see that 


CNOT 00), = |00), 
CNOT|01), = |11), 
CNOT|10), = |10), 
CNOT|11), = |01),, 


292 


demonstrating that, in the x-basis, the B-register is the control and the A is the 
target. 


[Preview. We are going to revisit this in a circuit later today. For now, we’ll call 
it an “upside-down” action of the CNOT gate relative to the x-basis and later see how 
to turn it into an actual “upside-down” CNOT gate for the natural CBS kets. When 
we do that, we’ll call it “CtNOT,” because it will be controlled from bottom-up in 
the z-basis. So for, however, we’ve only produced this bottom-up behavior in the 
x-basis, so the gate name does not change. 


What About Measurement? I advertised this section as a study of measure- 
ment, yet all I did so far was make observations about the separable components of 
the output — which was an eye-opener in itself. Still, let’s bring it back to the topic 
of measurement. 


Take any of the input states, say |10),. Then the above results say that 
CNOT|10), = |10),, . 


To turn this into a statement about measurement probabilities, we “dot” the output 
state with the x-basis kets to get the four amplitudes. By orthogonality of CBS, 

00 | 10 
01] 10 
10| 10 
11) 10 


(00/10), = 0 
x = 0 
( 1 
( 0 


) 
Me 
) 
) 


x xv ’ 


producing the measurement probability of 100% for the state, |10),. In other words, 
for this state — which has a B input of 0 (in x-coordinates) — its output remains 0 
with certainty, while A’s 1 (again, x-coordinates) is unchanged, also with certainty. 
On the other hand, the input state |11),, gave us an output of |01),, so the output 
amplitudes become 


aw? 


(00|01 
(01) 01 
(10| 01 
( 


zr zr 


0 

1 

= 0 
11} 01 0 


) 
I 
) 
) 


rt zr 


Here the input — whose B x-basis component is 1 — turns into an output with the B 
x-basis component remaining | (with certainty) and an A z-basis input 1 becoming 
flipped to 0 at the output (also with certainty). 


This demonstrates that such statements like “The A-register is left unchanged” 
or “The A register is the control qubit,” are loosey-goosey terms that must be taken 
with a grain of salt. They are vague for non-separable states (as we saw, earlier) and 
patently false for measurements in alternate CBSs. 


293 


11.3.5 The Second-Order Hadamard Gate 


We construct the second order Hadamard gate by forming the tensor product of two 
first order gates, so we’d better first review that operator. 


Review of the First Order Hadamard Gate 


Recall that the first order Hadamard gate operates on the 2-dimensional Hilbert space, 
H, of a single qubit according to its effect on the CBS states 
0 1 
His = as 
V2 
}0) — |1) 
Bly =) le. = a 
V2 


which, in an exercise, you showed was equivalent to the CBS formula 


JO) + (=1)" |) 


|x) H B 


44). 


and that it affects a general qubit state, |~) = a|0)+6|1), according to 
a+ 6B a— P| 
A = 0) + Les 
wm = (SE) + (SF)m 


? : Definition and Symbol 


We learned that its matrix is 


Definition. The second order Hadamard gate, also called the two-qubit or binary 
Hadamard gate, is the tensor product of two single-qubit Hadamard gates, 


HY? =. Hee. 


Notation. You will see both H ® H and H®? when referring to the second order 
Hadamard gate. In a circuit diagram, it looks like this 


H®2 


when we want to show the separate A and B register, or like this 


H®2 


294 


when we want to condense the input and output pipes into a multi-pipe. However, it 
is often drawn as two individual H gates applied in parallel, 


fel 
fel 


|x) |y) : Action on the CBS 


When a two-qubit operator is defined as a tensor product of two single-qubit oper- 
ators, we get its action on CBS kets free-of-charge compliments of the definition of 
product operator. Recall that the tensor product of two operators 7; and T>, requires 
that its action on separable tensors be 


[1 @T|(vew) = Tv @Tow. 


This forces the action of H ® H on the H(z) computational basis state |0) |0) (for 
example) to be 


[#@ H](fo)[0)) = xoxo) = (AM) (Me). 


Separability of CBS Output. Let’s pause a moment to appreciate that when an 
operator in Hg) is a pure product of individual operators, as this one is, CBS states 
always map to separable states. We can see this in the last result, and we know it will 
happen for the other three CBS states. 


CBS Output in z-Basis Form. Separable or not, it’s always good to have 
the basis-expansion of the gate output for the four CBS kets. Multiplying it out for 
H®?(\0) |0)), we find 


wem(loy) = M+ Oh) + Wo + mm _ I 


See 


Doing the same thing for all four CBS kets, we get the identities 


H®|0)|0) = 0) |0) + _ 1)|0) + |1){1) 
re oyjyy = OO — OID + WM) — IY 
ne yioy = OO + I) — Wi) — IH 
yet yyy = OO — i) — i + IN 


295 


Condensed Form #1. There is a single-formula version of these four CBS results, 
and it is needed when we move to three or more qubits, so we had better develop it 
now. However, it takes a little explanation, so we allow a short side trip. 


First, let’s switch to encoded notation (0 <= 00, 14 01,2<10 and3 11), 
and view the above in the equivalent form, 


Oye: Spr tie. op NON ee ie 


H®? 2 = 
0) ; 
H® 1)? = Oy eke TOS as, ND ee sighs 
2 
H® |)? = Oy)? + |1)? — |2)? - |3)? 
2 
H®? |3)? = Oy? — |l)? — |2)? + 3)? 
5 . 


Next, I will show (with your help) that all four of these can be summarized by the 
single equation (note that x goes from the encoded value 0 to 3) 


H®? |x)? 


AO) a Ge 8 a 8 al) Ge 9 lc 

2 2 
where the operator © in the exponent of (—1) is the mod-2 dot product that was 
defined for the unusual vector space, B?, during the lesson on single qubits. I'll 
repeat it here in the current context. For two CBS states, 


= Xo and 


Yio; 


where the RHS represents the two-bit string of the CBS (“00,” “01,” etc.), we define 
LOY SS 0 UD Ler 


that is, we multiply corresponding bits and take their mod-2 sum. To get overly 
explicit, here are a few computed mod-2 dot products: 


T= 21% | Y= YiYo || & 
3=11 3=11 
1=01 1=01 
0 = 00 2=10 
1=01 3=11 
1=01 2=10 


y 


CGrFPOFRT)® 


296 


I'll now confirm that the last expression presented above for H®? |x)”, when applied 
to the particular CBS |3)? & |1) |1), gives the right result: 


ih) ame cc 


(=) 0)? eit (=e 1)? ae (=ijuer 2)? st (ijt 3)? 
2 
(=D 0) en Se G2) De Be 


OSS] eye 8 


which matches the final of our four CBS equations for H®?. Now, for the help I 
warned you would be giving me. 


[Exercise. Show that this formula produces the remaining three CBS expressions 
for H®? | 


We now present the condensed form using summation notation, 


3 
$2 (-1)?°" |y)? 


y=0 


H® |7\? = 


Nl rR 


Most authors typically don’t use the “©” for the mod-2 dot product, but stick with a 
simpler “-”, and add some verbiage to the effect that “this is a mod-2 dot product...,” 
in which case you would see it as 


3 
1 
H®2 = » (= “Y |a)\2 


Putting this result into the circuit diagram gives us 


NIR 


~~ , 3 
Ic)? > H® doy ly)? 


Je = tt y=0 


or, using the multi-channel pipe notation, 


3 
jx)? H® x > (-D** ly)’ 


y=0 


Condensed Form #2. You may have noticed that the separability of the CBS 
outputs, combined with the expression we already had for a single-qubit Hadamard, 


iin = WRU 


297 


gives us another way to express H in terms of the CBS. Once again, invoking the 
definition of a product operator, we get 


(H @ H] (|2) ly) ) 


H |x) ®@ H |y) 


= (e te e) (2 Sa Pp) , 


which, with much less fanfare than condensed form #1, produces a nice separable 
circuit diagram definition for the binary Hadamard: 


Oy: ate Ls) 
V2 


a JO) + (—1)" |1) 
J2 
which I will call “condensed form #2”. 
(Caution: The y in this circuit diagram labels the B-register CBS ket, a totally 
different use from its indexing of the summation formula in condensed form #1. 
However, in both condensed forms the use of y is standard. 


Iz) — 


H®2 


The reason for the hard work of condensed form #1 is that it is easier to generalize 
to higher order qubit spaces, and proves more useful in those contexts. 

(Exercise. Multiply-out form #2 to show it in terms of the four CBS tensor kets.| 

[Exercise. The result of the last exercise will look very different from condensed 


form #1. By naming your bits and variables carefully, demonstrate that both results 
produce the same four scalar amplitudes “standing next to” the CBS tensor terms.| 


Oey) : The Matrix 


This time we have two different approaches available. As before we can compute 
the column vectors of the matrix by applying H®? to the CBS tensors. However, 
the separability of the operator allows us to use the theory of tensor products to 
write down the product matrix based on the two component matrices. The need for 


frequent sanity checks wired into our collective computer science mindset induces us 
to do both. 


Method #1: Tensor Theory. 
Using the standard A-major (B-minor) method, a separable operator A ® B can 


298 


be immediately written down using our formula 


ao0 G01 *** Go(i-1) boo bor +°: Do(m—1) 
aio G11 *** Ai(i-1) | ® Dios Vinee bi 
boo 01 ++ boo 201 
~ boo bor °°° boo b01 
a10 bio bit as ay bio bi 


For us, this means 


eae: a ; eG 4)) 
1 “hey = “\ v2 = 
Bacee- xs, -al 1 -1 1 -1 
VV it wah: ey ee ae 
va\1 -1 va\1 -1 
i) a 3 
— afi -1 1-1 
~ 1 1-1 -1]° 
1-1-1 1 


a painlessly obtained result. 
Method #2: Linear Algebra Theory. 


Our earlier technique which works for any transformation, separable or not, in- 
structs us to place the output images of the four CBS kets into columns of the desired 


299 


matrix. Using our four expressions for H®? |r) |y) presented initially, we learn 


1 
as 1 1{4 
H®? |00) = 5 (100) + ol) + |10) + |11)) = 51, 
1 
1 
1 pe 
H® |0l) = 5((00) - |o1 10) - [ul)) = 5 
jor) = 5 (loo) — jor) + fry - ny) = 5/5 
ea 
1 
1 ie 
H® |10) = = ( t= 10): <5 Mi ) 12 
|10) 5 (100) + 01) |10) 11) a ee 
2 
1 
1 th] 24 
H®|11) = =( 00) — |o1) — |10 Ul ) 7 2 
|11) 5 ( 100) |01) [10) + |11) ml 
1 
Therefore, 
Mye2 = (210, Ae? 01) 3. a?” 10) n= | 
i. a “a if 
-. Kf ed. a el 
= soe, 4d en, ea 
es ee Rae 


happily, the same result. 


(Exercise. Complete the sanity check by confirming that this is a unitary matrix.| 


|b)? : Behavior on General State 


To a general 


a 
Se (TP 
Iv) Alls 
6 
we apply our operator 
1 1 1 a at Bey+o 
Dek. ke Sie any 8 lla-B+ 7-6 
@2i\2 — = =e 
Gees ome 2;1 1-1 -1 y 2 Ge Bayo 1" 
es ee a—-B-y+6 


which shows the result of applying the two-qubit Hadamard to any state. 


\, : Measurement 


There is no concise phrase we can use to describe how the binary Hadamard affects 
measurement probabilities of a general state. We must be content to describe it in 
terms of the algebra. For example, testing at point P, 


——t= | th 
by’ > EPA H® |p)” 
; 
P Q 


collapses |)” to |00) with probability |a|? (as usual). Waiting, instead, to take the 


measurement of H®?|w)? (point Q), would produce a collapse to that same |00) with 
the probability 


a+B+y7+6|/ 


2 


(a+ B+7+6)" (a+ 6+ 74+) 
A y) 


an accurate but relatively uninformative fact. 


Measurement of CBS Output States. However, there is something we can 
say when we present any of the four CBS tensors to our Hadamard. Because it’s 
separable, we can see its output clearly: 


a jo) + (pF) 


93 v2 
' us lo) + (-1)" |1) 
J/2 
P Q 


This tells us that if we measure either register at Q, we will read a “0” or “1” with 
equal (50%) probability, and if we measure both registers at Q, we’ll see a “00,” “01,” 
“10” or “11” also with equal (25%) probability. Compare this to the probabilities at 
P, where we are going to get either of our two input qubit values, “x” and “y” with 
100% probability (or, if we measure both, the bipartite input value “xy” with 100% 
probability) since they are CBS states at the time of measurement. 


301 


Converting Between z-Basis and x-Basis using H®? 


The induced z-basis for H,) is 
|0)|0), 0) |1), 1) 0), and = |1)|1), 
while the induced x-basis for Hi) is 
or in alternate notation, 
Pepi: ley mpe ees samd: [pil 


Since H converts to-and-from the bases {|0), |1)} and {|0),, |1),} in Ha), it is a 
short two-liner to confirm that the separable H®? converts to-and-from their induced 
second order counterparts in Hg). Pll show this for one of the four CBS kets and you 
can do the others. Let’s take the third z-basis ket, |1) |0), 


H®* |1)|0) = [H®H] (|1)@|0)) = (H|1)) @(#(0)) 
a. ECO) ee Oe a ar) 
[Exercise. Do the same for the other three CBS kets. | 


This gives us a very useful way to view H®? when we are thinking about bases: 


H®? is the transformation that takes the z-CBS to the x-CBS (and since 
it is its own inverse, also takes x-CBS back to the z-CBS). 


Algebraically, using the alternate |-+) notation for the z-basis, the forward direction 
is 


H®? |00) = |++), 
H® |91) = |+-), 
H®? |10) = |-+) and 
Heil) ee 
and the inverse direction is 
H® |++) = [00), 
BP eps =. (0D 
H®? |-+) = |10) and 
H™ |=-) = (11). 


11.4 Measuring Along Alternate Bases 


In the earlier study, when we were applying the naked CNOT gate directly to kets, 
I described the final step as “measuring the output relative to the x-basis.” At the 
time it was a bit of an informal description. How would it be done practically? 


302 


11.4.1 Measuring Along the x-Basis 


We apply the observation of the last section. When we are ready to measure along 
the x-basis, we insert the quantum gate that turns the x-basis into the z-basis, then 
measure in our familiar z-basis. For a general circuit represented by a single operator 
U we would use 


H A 


H A 


to measure along the x-basis. (The meter symbols imply a natural z-basis measure- 
ment, unless otherwise stated.) 


11.4.2 Measuring Along any Separable Basis 


Say we have some non-preferred basis, 


C = {]€o) » |&)} 


of our first order Hilbert space, Hi). Let’s also assume we have the unary operator, 
call it T, that converts from the natural z-CBS to the C-CBS, 


T: z-basis —> C, 
T |0) = |Eo) ; 
dnl Lye Et) 


Because T is unitary, its inverse is T', meaning that Tt takes a C-CBS back to a 
z-CBS. 


T':C —> xbasis, 
i Eo) = |0) ’ 
T'\&) = 1). 
Moving up to our second order Hilbert space, H 2), the operator that converts from 


the induced 4-element z-basis to the induced 4-element C-basis is T ® T’, while to go 
in the reverse direction we would use T? @ T", 


(T' @T") |fo) = |00) , 
(T'@T") |) = (01) 

(T' @T") fo) = [10), 
(TI@T) én) = |1). 


303 


To measure a bipartite output state along the induced C-basis, we just apply JT? @ T™ 
instead of the Hadamard gate H ® H at the end of the circuit: 


TT A 


Ti A 


Question to Ponder 


How do we vocalize the measurement? Are we measuring the two qubits in the C basis 
or in the z-basis? It is crucial to our understanding that we tidy up our language. 
There are two ways we can say it and they are equivalent, but each is very carefully 
worded, so please meditate on them. 


1. If we apply the T' gate first, then subsequent measurement will be in the z-basis. 


2. If we are talking about the original output registers of U before applying the T™ 
gates to them, we would say we are “measuring that pair along the C basis.” This 
version has built-into it the implication that we are “first sending the qubits 
through T's, then measuring them the z-basis.” 


How We Interpret the Final Results 


If we happen to know that, just after the main circuit but before the final basis- 
transforming T' @ T', we had a C-CBS state, then a final reading that showed the 
binary number “z” (2 = 0,1,2 or 3) on our meter would imply that the original 
bipartite state before the T's was |€,). No collapse takes place when we have a CBS 
ket and measure in that CBS basis. 


However, if we had some superposition state of x-CBS kets, wy, at the end of our 
main circuit but before the basis-transforming TJ’ @ T', we have to describe things 
probabilistically. Let’s express this superposition as 


IW)? = coo |€o0) + cor €or) + cro lio) +. cn |Eu1) - 
By linearity, 
og ® T") |)? = Coo |00) + Co1 |01) + C10 |10) + Cy |11) 7 


a result that tells us the probabilities of detecting the four natural CBS kets on our 
final meters are the same as the probabilities of our having detected the corresponding 
x-CBS kets prior to the final T' @ Tt gate. Those probabilities are, of course, lex”, 
for x = 0,1, 2 or 3. 


304 


11.4.3. The Entire Circuit Viewed in Terms of an Alternate 
Basis 


If we wanted to operate the circuit entirely using the alternate CBS, i.e., giving it 
input states defined using alternate CBS coordinates as well as measuring along the 
alternate basis, we would first create a circuit (represented here by a single operator 
U) that works in terms of the z-basis, then surround it with T and T™ gates, 


T os 


T as 


11.4.4 Non-Separable Basis Conversion Gates 


While on the topic, I should mention non-separable basis conversion transformations 
(to/from the z-basis). A basis conversion operator doesn’t have to be separable in 
order for it to be useful in measuring along that basis. Like all quantum gates it is 
unitary, so everything we said about separable conversion operators will work. To 
measure along any basis, we find the binary operator, call it S, that takes the z-basis 
to the other basis and use S" prior to measurement. 


vA 


U st 


Te 


An Entire Circuit in a Non-Separable Basis 


If we wanted to design a circuit in terms of input coordinates and measurements 
relative to a non-preferred (and non-separable) basis, we would 


1. first define the circuit in terms of the natural z-basis, then 


2. sandwich it between S and 5S gates. 


The circuit diagram for this is 


S U st 


We'll be using both separable and non-separable basis conversion operators today 
and in future lessons. 


305 


11.5 Variations on the Fundamental Binary Qubit 
Gates 

We’ve seen three examples of two-qubit unitary operators: a general learning example, 

the CNOT and the H®?. There is a tier of binary gates which are derivative of one 


or more of those three, and we can place this tier in a kind of “secondary” category 
and thereby leverage and/or parallel the good work we’ve already done. 


11.5.1 Other Controlled Logic Gates 


The Controlled-U Gate 


The CNOT is a special case of a more general gate which “places” any unary (one- 
qubit) operation in the B-register to be controlled by the qubit in the A-register. 
If U is any unary gate, we can form a binary qubit gate called a controlled-U gate 
(or U-operator). Loosely stated, it applies the unary operator on one register’s CBS 
conditionally, based on the CBS qubit going into the other register. 


. : The Symbol 


The controlled-U gate is drawn like the CNOT gate with CNOT’s © operation re- 
placed by the unary operator, U, that we wish to control: 


For example, a controlled-bit flip (= controlled-QNOT) operator would be written 


EI 


xX 


whereas a controlled-phase shift operator, with a shift angle of 6, would be 


306 


The A-register maintains its role of control bit/register, and the B-register the target 
register or perhaps more appropriately, target (unary) operator. 


“control register” — i 


“target operator” —> U 


There is no accepted name for the operator, but I’ll use the notation CU (i.e., 
CX, C(Re), etc.) in this course. 


|z) |y) : Action on the CBS 
It is easiest and most informative to give the effect on the CBS when we know which 
specific operator, U, we are controlling. Yet, even for a general U we can give formal 


expression using the power (exponent) of the matrix U. As with ordinary integer 
exponents, U” simply means “multiply U by itself n times” with the usual convention 


that U® = 1: 
|x) ’ |x) 
U 


oD) U* ly) 


The abstract expression is equivalent to the more understandable 


Uli: ieee 


It will help if we see the four CBS results as explicitly: 


1 
0 
CU|00) = |0)\0o) = Jy 
0 
0 
1 
cujol) = [oy = Jo 
0 
0 
0 
CU|10) = |1)U]o) = fay 
Uro 
0 
0 
cut) = UR) = [a 
Uy, 


307 


Here the Uj, are the four matrix elements of My. 


[Exercise. Verify this formulation.| 


has) : The Matrix 


The column vectors of the matrix are given by applying CU to the CBS tensors to 
get 


Mage = (co CU 01), CU 10) , cv 
1 0 O 0 
01 O 0 

= 00), 01), J1)U|O), J1)U lI = 
(10 lol), (2) 70) pvt ) ee 
0 0 Uy Uy 


which can be more concisely expressed as 


1} O 
CU = 
& 2 


or, the even cleaner 


1 
U 
[Exercise. Prove that this is a separable operator for some U and not-separable 
for others. Hint: What did we say about CNOT in this regard?] 


[Exercise. Confirm unitarity. | 


|b)” : Behavior on General State 


Applying CU to the general state, 


lw)? = a0) |0) + BlO)|1) + y1)J0) + 4|1)|1), 
we get 
1 0 O 0 ray a 
OF 2k a: B B 
CU\p)? = 
IY) 0 0 Ugo U1 % Uoo y + Uo 6 
0 0 Uy Uy () Vie ya ig 0 


= a|0)|0) + @|0)|1) + (Uooy + Uor 6) |1)|0) + (roy + Ui1 9) |1) [1) . 


308 


\, : Measurement 


There’s nothing new we have to add here, since measurement probabilities of the 
target-register will depend on the specific U being controlled, and we have already 
established that the measurement probabilities of the control register are unaffected 
(see CNOT discussion). 


The Controlled-Z Gate 


Let’s get specific by looking at CZ. 


} : The Symbol 


Its symbol in a circuit is 


Nt 


|a) |y) : Action on the CBS 


Recall that we had what appeared to be an overly abstract expression for the unary 
Z-gate operating on a CBS |z), 


Z\z) = (-1)*|z), 
but this actually helps us understand the CZ action using a similar expression: 


CZ|x)|y) = (-1% |x) ly) - 


(Exercise. Prove this formula is correct. Hint: Apply the wordy definition of a 
controlled Z-gate (“leaves the B-reg alone if ... and applies Z to the B-reg if ...”) to 
the four CBS tensors and compare what you get with the formula (—1)" |x) |y) for 
each of the four combinations of x and y.| 


309 


To see the four CBS results as explicitly, 


1 
CZ|00) = H 
0 
0 
CZ|01) = 
0 
0 
CZ\10) = H 
0 
0 
CZ|11) = ( 
—1 


(io) : The Matrix 


The column vectors of the matrix were produced above, so we can write it down 
instantly: 


CZ. = 


ooo rF 
oor 
or GO © 
S.C: 


[Exercise. Prove that this (a) is not a separable operator and (b) is unitary. ] 


|)? : Behavior on General State 


Applying CU to the general state, 


Ib)” = a0) |0) + 0) |1) + 7|1)10) + 6]1) (2), 
we get 
HOO 0 a a 
cow = (9 o][8] - [2 
000 -l 6 —6 


310 


\, : Measurement 


As we noticed with the unary Z-operator, the probabilities of measurement (along the 
preferred CBS) are not affected by the controlled-Z. However, the state is modified. 
You can demonstrate this the same way we did for the unary Z: combine |w)* and 
CZ |w)* with a second tensor and show that you (can) produce distinct results. 


Swapping Roles in Controlled Gates 


We could have (and still can) turn any of our controlled gates upside down: 


Now the B-register takes on the role of control, and the A-register becomes the target: 


“target operator” —> 


U 
“control register” —> ! 


Let’s refer to this version of a controlled gate using the notation (caution: not 
seen outside of this course) (Ct) U 
Everything has to be adjusted accordingly if we insist — which we do — on continu- 


ing to call the upper register the A-register, producing A-major (B-minor) matrices 
and coordinate representations. For example, the action on the CBS becomes 


|x) U U¥ |x) 
ly) ly) 


and the matrix we obtain (make sure you can compute this this) is 


1 0 0 0 

0 Ugo O Un 
Ch)U = 4 
ey) 0 0 1 0 

0 Uy O Uy 


where the U;, are the four matrix elements of My. 


With this introduction, try the following exercise. 


dll 


[Exercise. Prove that the binary quantum gates CZ and (Cf) Z are identical. 


That is, show that 
= _ vA 


Z 


Hint: If they are equal on the CBS, they are equal period.] 


11.5.2 More Separable Logic Gates 


The binary Hadamard gate, H ® H, is just one example of many possible separable 
gates which, by definition, are constructed as the tensor product of two unary opera- 
tors. Take any two single-qubit gates, say X and H. We form the product operator, 
and use tensor theory to immediately write down its matrix, 


0- My 1-My 
ROO 
1: My O- My 


0 01 1 
ap VO oO St 
we gen LS. val: BOs (20 

1-10 0 


[Exercise. Verify unitarity. ] 
Vocabulary. Separable operators are also called local operators. 


Separable Operators on Separable States. All separable operators take sep- 
arable states to other separable states, therefore, they preserve locality (which is why 
they are also known as local operators). In particular, their action on the CBS result 
in separable outputs. 


One of the notational options we had for the binary Hadamard gate, 


AH 
fel 


provided a visual representation of this locality. It correctly pictures the non-causal 
separation between the two channels when separable inputs are presented to the gate. 
This is true for all separable operators like our current X @ H, 


Ib) —XF— Xv) 


@ @ 
ly) H F |p) 


demonstrating the complete isolation of each channel, but only when separable states 
are sent to the gate. If an entangled state goes in, an entangled state will come out. 


312 


The channels are still separate, but the input and output must be displayed as unified 
states, 


xX 
Wy? \ (X @ A) |b)’ 
A 


Separable Operators on Entangled States 


A single-qubit operator, U, applied to one qubit of an entangled state is equivalent 
to a separable binary operator like 1 @ U or U ® 1. Such “local” operations will 
generally modify both qubits of the entangled state, so a separable operator’s effect 
is not restricted to only one channel for such states. To reinforce this concept, let U 
be any unary operator on the A-space and form the separable operator 


US: Ha@Hep — Ha@He. 


Let’s apply this to a general Ib)”, allowing the possibility that this is an entangled 
state. 


(U@1))? = [Vet] (aloo) + Blorl) + y[10) + 4|11)) 
a(U |0)) jo) + B(UJ0))|1) + y(Ul1))|0o) + 6(U]1)) |1). 


Replacing U |0) and U |1) with their expansions along the CBS (and noting that the 
matrix elements, U;,, are the weights), we get 


(U @ 1) |v)’ 
= a(Uoo |0) + Uio|1) ) |0) + 8(Uoo |0) + Uro |1) ) |1) 
+ (Ups |0) + Urs |1)) (0) + 6(Uor [0) + Us [1)) |) - 


Now, distribute and collect terms for the four CBS tensor basis, to see that 
(U@1) |b)? = (aUoo + Wor) 00) + 


and without even finishing the regrouping we see that the amplitude of |00) can be 
totally different from its original amplitude, proving that the new entangled state is 
different, and potentially just as entangled. 


[Exercise. Complete the unfinished expression I started for (U @ 1) |W)”, collect 
terms for the four CBS kets, and combine the square-magnitudes of |0),, to get the 
probability of B measuring a “0”. Do the same for the |1), to get the probability 
of B measuring a “1”. Might the probabilities of these measurements be affected by 
applying U to the A-register?| 

Vocabulary. This is called performing a local operation on one qubit of an 
entangled pair. 


313 


The Matrix. It will be handy to have the matrix for the operator U @ 1 at 
the ready for future algorithms. It can be written down instantly using the rules for 
separable operator matrices (see the lesson on tensor products), 


Uoo 0 Uni 0 
0 Uo 0 Uor 


Ur0 0 Uy 0 
0 Ui0 0 Ui 


Example. Alice and Bob each share one qubit of the entangled pair 


1 

ony = WO+N _ 1 Yo 
00 = V2 = V2 0 ’ 

1 


which you'll notice I have named |{o9) for reasons that will be made clear later today 
when we get to the Bell states. Alice will hold the A-register qubit (on the left of 
each product) and Bob the B-register qubit, so you may wish to view the state using 
the notation 


ale + [ale 
FB ' 


Alice sends her qubit (i.e., the A-register) through a local QNOT operator, 


01 
fo ua) 


and Bob does nothing. This describes the full local operator 


X@l 


applied to the entire bipartite state. We want to know the effect this has on the 
total entangled state, so we apply the matrix for X ® 1 to the state. Using our 
pre-computed matrix for the general U ®@ 1 with U = X, we get 


00 10 1 0 
OQ. Qed 1 10 1] 
X@1) (0) = Se lig (ee 
( 10 00] v2\29 J2 {1 
Gh’ 6.6 1 0 
01) + |10) 


a 


314 


We could have gotten the same result, perhaps more quickly, by distributing X @ 1 
over the superposition and using the identity for separable operators, 


[S@T]|(v@w) = S(v)@T(w). 


Either way, the result will be used in our first quantum algorithm (superdense coding), 
so it’s worth studying carefully. To help you, here is an exercise. 


[Exercise. Using the entangled state |{o9) as input, prove that the following local 
operators applied on Alice’s end produce the entangled output states shown: 


© (Z@1) (00) = (|00) — [11)) /v2 
© (iY @1) |0) = (\01) — |10))/v2 
and, for completeness, the somewhat trivial 


© (1@1) |Bo0) = (00) + |11))/v2. | 


11.6 The Born Rule and Partial Collapse 


We’ve had some experience with the famous entangled state 


0) 10) + |) |) 
Fi 


If we measure the A-register of this state, it will of course collapse to either |0) , or 
|1) , as all good single qubits must. That, in turn, forces the B-register into its version 
of that same state, |0), or |1),, meaning if A collapsed to |0),, B would collapse to 
|0) 3, and likewise for |1); there simply are no terms present in |o9)’s CBS expansion 
in which A and B are in different states. In more formal language, the CBS terms in 
which the A qubit and B qubit differ have zero amplitude and therefore zero collapse 
probability. 


| Soo) 


To make sure there is no doubt in your mind, consider a different state, 


Wy? = JO) 2) +1) 10) 
V2 

This time, if we measured the A-register and it collapsed to |0),, that would force 

the B-register to collapse to |1) p. 


[Exercise. Explain this last statement. What happens to the B-register if we 
measure the A-register and it collapses to |1) ,?| 


All of this is an example of a more general phenomenon in which we measure one 
qubit of an entangled bipartite state. 


315 


11.6.1 The Born Rule for a Two-Qubit System 


Take the most general state 


wy? = al00) + B01) + y/10) + d|11) = 


Ae WR 


and rearrange it so that either the A-kets or B-kets are factored out of common terms. 
Let’s factor the A-kets for this illustration. 


WY = 104 (21)y + Ble) + Walrl)e + 511)g). 


(I labeled the state spaces of each ket to reinforce which kets belong to which register, 
but position implies this information even without the labels. I will often label a 
particular step in a long computation when I feel it helps, leaving the other steps 
unlabeled.) 


What happens if we measure A and get a “0”? Since there is only one term which 
matches this state, namely, 


0) (alo) + git), 
we are forced to conclude that the B-register is left in the non-normalized state 
a|0) + 6{1). 
There are a couple things that may be irritating you at this point. 


1. We don’t actually have a QM postulate that seems to suggest this claim. 


2. We are not comfortable that B’s imputed state is not normalized. 


We are on firm ground, however, because when the postulates of quantum mechanics 
are presented in their full generality, both of these concerns are addressed. The 
fifth postulate of QM (Trait #7), which addresses post-measurement collapse has 
a generalization sometimes called the generalized Born rule. For the present, we’ll 
satisfy ourselves with a version that applies only to a bipartite state’s one-register 
measurement. We'll call it the ... 

Trait #15 (Born Rule for Bipartite States): Jf a bipartite state is factored 
relative to the A-register, 


Is)? = [0) (alo) + BIt)) + [1)(yl0) + 611), 


316 


a measurement of the A-register will cause the collapse of the B-register according to 


al) + BIL) 
Vie + 15 
710) + 611) 
Vie +P 


Note how this handles the non-normality of the state a|0) + (|1): we divide though 
by the norm ,/|a|? + |G|?. (The same is seen for the alternative state y|0) + 6|1)). 


Vocabulary. We’ll call this simply the “Born Rule.” 


A\Y0 = BY, 


A\Y1 > BSN 


Trait #415 has a partner which tells us what happens if we first factor out the 
B-ket and measure the B-register. 


[Exercise. State the Born Rule when we factor and measure the B-register.| 


Checking the Born Rule in Simple Cases 


You should always confirm your understanding of a general rule by trying it out on 
simple cases to which you already have an answer. 


Example. The state we encountered, 


0) 10) + |) 11) 


wy? 


has 


Cg. or = 1. and 
B=7 = 0, 
so if we measure the A-register and find it to be in the state |0) , (by a measurement 


of “0” on that register), the Born rule tells us that the state remaining the B-register 
should be 


alo) + él) _ 1-10) +0) _ wy 


Vial + 16? VP + 


which is what we concluded on more elementary grounds. V 


[Exercise. Use this technique to prove that if we start in the same state and, 
instead, measure the B-register with a result of “7,” then the A-register is left in the 
state |1) , with certainty. ] 


Example. Let’s take a somewhat less trivial bipartite state. Consider 


Wy? = JO) 10) +10) |1) t EOP ED 


317 


and imagine that we test the B-register and find that it “decided” to collapse to |0) ,. 
To see what state this leaves the A-register in, factor out the B-kets of the original 
to get an improved view of |w)7, 


oh a (1), + Ma) se + (IM. - Wa) ls 
: : | 


Examination of this expression tells us that the A-register corresponding to |0) , is 
some normalized representation of the vector |0) + |1). Let’s see if the Born rule 
gives us that result. The expression’s four scalars are 


so a B-register collapse to |0) , will, according to the Born rule, leave the A-register 
in the state 


al)t+7i) — 31 4+351) — | +1) 


Vial? + ly? v (1/2)? + (1/2)? v2 


again, the expected normalized state. 


[Exercise. Show that a measurement of B\, 1 for the same state results in an 
A-register collapse to (|0) — |1) )\/Vv2)] 


Application of Born Rule to a Gate Output 


The Born rule gets used in many important quantum algorithms, so there’s no danger 
of over-doing our practice. Let’s take the separable gate 1 @® H, whose matrix you 
should (by now) be able to write down blindfolded, 


1 1 
1 1 -1 
1 -l 
Hand it the most general |)? = (a, 6, 7, 5)’, which will produce gate output 

até 
Lfa-6 
217+ 
y¥—0o 


We measure the B register and get a “1”. To see what’s left in the A-register, we 
factor the B-kets at the output, 


(1@H) |v)? = ((a+8)|0) + (y+8)|1)) 10) 
+ ((a—8)|0) + (7-8) |) ) |). 


318 


(At this point, we have to pause to avoid notational confusion. The “a” of the Born 
rule is actually our current (a — 3), with similar unfortunate name conflicts for the 
other three Born variables, all of which ’'m sure you can handle.) The Born rule says 
that the A-register will collapse to 
(a= 8) 10) + y= 9) 1) 
Vle— BP + oP 


which is as far as we need to go, although it won’t hurt for you to do the ... 


2. 


[Exercise. Simplify this and show that it is a normal vector.| 


11.7 Multi-Gate Circuits 


We have all the ingredients to make countless quantum circuits from the basic binary 
gates introduced above. We’ll start with the famous Bell states. 


11.7.1 A Circuit that Produces Bell States 


There are four pairwise entangled states, known as Bell states or EPR pairs (for the 
physicists Bell, Einstein, Podolsky and Rosen who discovered their special qualities). 
We've already met one, 


oy = ODE 
and the other three are 

fy 

Bio) = a and 

Bu) ]O1) — [10) | 


The notation I have adopted above is after Nielsen and Chuang, but physicists also 
use the alternative symbols 


Boo) —> |®*), 
Bah. a" Ps 
Bio) —> |®-) and 
Bu) —> |W). 


In addition, I am not using the superscript to denote a bipartite state ( | Bias Ms since 
the double index on ( (399) tells the story. 


319 


The circuit that produces these four states using the standard CBS basis for (2) 


as inputs is 
|t) —H 
i —+ |Bay) ; 
WY 


ly) 
which can be seen as a combination of a unary Hadamard gate with a CNOT gate. 
We could emphasize that this is a binary gate in its own right by calling it BELL and 
boxing it, 


2) 
\ [Bay) - 


ly) ‘<p 


In concrete terms, the algebraic expression, 


BELL ( |z)|y) ) = |Bey) 
is telling us that 
BELL ( |0)|0) ) =  |Boo) , 
BELL ( |0)|1) ) = |o1) , 
BELL ( |1)|0) ) = |@10) and 
BELL ( |1)|1) ) = |61). 


The Bell States Form an Orthonormal Basis 


When studying unary quantum gates, we saw that they take orthonormal bases to 
orthonormal bases. The same argument — unitarity — proves this to be true of any 
dimension Hilbert space. Consequently, the Bell states form an orthonormal basis for 
H (2). 


The Matrix for BELL 


The matrix for this gate can be constructed using the various techniques we have 
already studied. For example, you can use the standard linear algebra approach of 
building columns for the matrix from the four outputs of the gate. For variety, let’s 
take a different path. The A-register Hadamard has a plain quantum wire below it, 
meaning the B-register is implicitly performing an zdentity operator at that point. 
So we could write the gate using the equivalent symbolism 


: | 
1 


320 


a visual that demonstrates the application of two known matrices in series, that is, 


1000 or i 0 
Od OO. iy 208 
BELL = (CNOT)(H@1) = 0001;gal1 0-1 0 
0010 Od Orsed 
ide “dh 20 
= Om 0: A 
iO. Be TO et 
L 0-1 


At this point, you should do a few things. 
[Exercise. 


1. Explain why the order of the matrices is opposite the order of the gates in the 
diagram. 


2. Confirm the matrix multiplication, above. 
3. Confirm that the matrix is unitary. 


4. Confirm that the matrix gives the four Bell states when one presents the four 
CBS states as inputs. (See below for a hint.) 


5. Demonstrate that BELL is not separable. Hint: What do we know about the 
matrix of a separable operator? | 


V'll do item 4 for the input state |10), just to get the blood flowing. 


10> 2. oO 70 1 
Le] 20? SOE <4) | 8 1 | 0 
BEL v2{0 1 0 -1/ 41} Yat o 
1 0-1 Of \O =1 
00) — |11) 
= vee aed 
i |P10) 


Four Bell States from One 


We did an example earlier that demonstrated that 


(X @1) |00) = (101) + [10)) /v2, 


321 


and followed it with some exercises. We can now list those results in the language of 
the EPR pairs. 


(1@1) [0) = (v0) 
(X@1) |Boo) = Bo) 
(Z@1) |Boo) = |Bro) 
(iy @1) Bo). = (Be) 


This might be a good time to appreciate one of today’s earlier observations: a local 
(read “separable”) operation on an entangled state changes the entire state, affecting 
both qubits of the entangled pair. 


BELL as a Basis Transforming Operator 


The operator BELL takes natural CBS kets to the four Bell kets, the latter shown to 
be an orthonormal basis for Hg). But that’s exactly what we call a basis transforming 
operator. Viewed in this light BELL, like H®?, can be used when we want to change 
our basis. Unlike H®?, however, BELL is not separable (a recent exercise) and not 
its own inverse (to be proven in a few minutes). 


Measuring Along the Bell Basis. We saw that to measure along any basis, 
we find the binary operator, call it S, that takes the z-basis to the other basis and 
use ST prior to measurement. 


x 


U rei 


ma 


Thus, to measure along the BELL basis (and we will, next lecture), we plug in BELL 
for S, 


AA 


U BELL' 


HA 


And what is BELL'? Using the adjoint conversion rules, and remembering that the 
order of operators in the circuit is opposite of that in the algebra, we find 


BELLt = | (CNOT) (#1) || 7 (#@1)' (CNory 


(4 ® 1) (CNOT) , 
the final equality a consequence of the fact that CNOT and H®1 are both self-adjoint. 


322 


[Exercise. Prove this last claim using the matrices for these two binary gates.] 


In other words, we just reverse the order of the two sub-operators that comprise 
BELL. This makes the circuit diagram for BELL’ come out to be 


came 


OD 1 


The matrix for BELL’ is easy to derive since we just take the transpose (everything’s 
real so no complex conjugation necessary): 


Evie te FONT tO: 3: a 
i (es 0, iG a 
a = 
Bn ee ud a Une ole. Oy 0) = 
i. 0-=Ay 0 Dot 1 


11.7.2 A Circuit that Creates an Upside-Down CNOT 


Earlier today we demonstrated that the CNOT gate, when presented and measured 
relative to the x-basis, behaved ” upside down”. Now, I’m going to show you how to 
turn this into a circuit that creates an ” upside down” CNOT in the standard z-basis 
CBS. 


First, the answer. Here is your circuit, which we'll call “CtNOT,” because it is 
controlled from bottom-up: 


It) —H}>—-+— AY |e © y) 


ly) Ht -®— ly) 


Notice that it has a right-side up (normal) CNOT gate at its core. 


The first thing we can do is verify the claim indicated at the output registers. 
As we just did with the BELL gate, we can do this most directly by multiplying the 
matrices of the three binary sub-gates evident by grouping this gate as 


it ¢ fel 
H OD H 


323 


That corresponds to the matrix product 


to he Fa 1c Oa i, fi ae oa 
ef SE She TY MOP Se OO ee SE 
CANOE Sted a Se eee Oo sa ie at 
| a= a I 0 le ee Et 
| oe ee Tee ib eh wh 400 0 
— ifi-1 1 -1 1-1 1-1] _ 1/0004 
AND Eve) ad a es | 410 04 0 
| es a dr ak a et 040 0 
POs 20-30 
wx | | OE, OF OE 
~ 100 1.0 
(emai Ee ak 


CtNOT : |00) H+ 00) 
CtNOT : 01) H+ 11) 
CtNOT : |10) H+ 10) 
CtNOT : |11) +4 01) 


We can now plainly see that the B-register is controlling the A-register’s QNOT 
operation, as claimed. 


Interpreting the Upside-Down CNOT Circuit 


We now have two different studies of the upside-down CNOT. The first study con- 
cerned the “naked” CNOT and resulted in the observation that, relative to the x-CBS, 
the B-register controlled the QNOT (X) operation on the A-register, thus it looks 
upside-down if you are an x-basis ket. The second and current study concerns a new 
circuit that had the CNOT surrounded by Hadamard gates and, taken as a whole is 
a truly upside-down CNOT viewed in the ordinary z-CBS. How do these two studies 
compare? 


The key to understanding this comes from our recent observation that H®? can 
be viewed as a way to convert between the x-basis and the z-basis (in either direction 
since it’s its own inverse). 


Thus, we use the first third of the three-part circuit to let H®? take z-CBS kets 
to x-CBS kets. Next, we allow CNOT to act on the x-basis, which we saw from 
our earlier study caused the B-register to be the control qubit — because and only 
because we are looking at x-basis kets. The output will be z-CBS kets (since we put 
x-CBS kets into the central CNOT). Finally, in the last third of the circuit we let 
H®? convert the x-CBS kets back to z-CBS kets. 


324 


11.8 More than Two Qubits 


We will officially introduce n-qubit systems for n > 2 next week, but we can find ways 
to use binary qubit gates in circuits that have more than two inputs immediately as 
long as we operate on no more than two qubits at at-a-time. This will lead to our 
first quantum algorithms. 


11.8.1 Order-3 Tensor Products 


If a second order product space is the tensor product of two vector spaces, 
W = AB, 


then it’s easy to believe that a third order product space, would be constructed from 
three component spaces, 


W = A®S®BOC. 


This can be formalized by relying on the second order construction, and applying it 
twice, e.g., 


W = (A@B)@C. 


It’s actually less confusing to go through our order-2 tensor product development and 
just extend all the definitions so that they work for three component spaces. For 
example, taking a page from our tensor product lecture, we would start with the 
formal vector symbols 


a®b@c 


and produce all finite sums of these things. Then, as we did for the A ® B product 
space, we could define tensor addition and scalar multiplication. The basic concepts 
extend directly to third order product spaces. I'll cite a few highlights. 


e The dimension of the product space, W, is the product of the three dimensions, 
dim(W) = dim(A)-dim(B) - dim (C) 
and W has as its basis 
{wy = aj@bk@e }, 
where {a;}, {b,} and {c;} are the bases of the three component spaces. 


e The vectors (tensors) in the product space are uniquely expressed as superpo- 
sitions of these basis tensors so that a typical tensor in W can be written 


w = Ss" Cykl (a; ® b;, ® cz), 
ji kl 


where C;,, are the amplitudes of the CBS kets, scalars which we had been naming 
a, 8, y, etc. in a simpler era. 


325 


e A separable operator on the product space is one that arises from three compo- 
nent operators, 74, 773 and T¢, each defined on its respective component space, 
A, B and C. This separable tensor operator is defined first by its action on 
separable order-3 tensors 


[T4®Tp@To|(a®b@c) = Tala) ® Tp(b) ® Te(c) 


and since the basis tensors are of this form, that establishes the action of T'4 ® 
Tp ® Tc on the basis which in turn extends the action to the whole space. 


If any of this seems hazy, I encourage you to refer back to the tensor produce 
lecture and fill in details so that they extend to three component spaces. 


(Exercise. Replicate the development of an order-2 tensor product space from 
our past lecture to order-3 using the above definitions as a guide.| 


11.8.2 Three Qubit Systems 


Definition of Three Qubits. Three qubits are, collectively, the tensor product 
spaceH ®H®H, and the value of those qubits can be any tensor having unit length. 


Vocabulary and Notation. Three qubits are often referred to as a tripartite 
system. 


To distinguish the three identical component spaces, we sometimes use subscripts, 
Hsa®Hp®Hc. 
The order of the tensor product, this time three, can used to label the state space: 


Hs) . 


11.8.3 Tripartite Bases and Separable States 


The tensor product of three 2-D vector spaces has dimension 2 x 2 x 2 = 8. Its 
inherited preferred basis tensors are the separable products of the component space 
vectors, 


{ 10) ® |0)  [0) , 0) @|0) @ |), 10) @|1) @|0) , [0) @ [1p @]1), 


1) @|0) @]0) , |1) @|0) @|1) , [1) @ |1) @ [O) , [1) @ [1) @|1) }. 


326 


The shorthand alternatives are 


0) @|0)@|0) +» |0)|0)|O) «+ Jo00) < + |o)% 
0) @ 0) @|1) «> Jo)jo)}1) «+ J001l) «4 |1)% 
0) @|1)@|0) <> |0)|1) Jo) <-> Jo10) <> |2)% 
0) @|1)@|1) <> jo)|1)j1) <4 Jol) <> |3)% 
1) @|0) @|0) <-> |1)|0)|0) <> |100) <4 |4)° 
1) @|0)@]1) <-> |1)]0)|1) <4 101) <4 5)° 
1) @|1) @|0) <-> |1)|1)|0) <4 |110) <4 |6)° 
1)@|1)@ll) es fyi) eo filay <6 |7)° 


The notation of the first two columns admits the possibility of labeling each of the 
component kets with the H from which it came, A, B or C, 


0) 4 ® |0) 5 @[0)¢ ++ |0)4 10), 10)¢ 
0) 4® |0)p @ [Io > 10), 10) p co 
etc. 


The densest of the notations expresses the CBS ket as an integer from 0 to 7. We 
reinforce this correspondence and add the coordinate representation of each basis ket: 


jooo) | joo1) | joo) | jo11) ] |100) | 101) ] Jato) | 4111) 
0)" [1)° |2)° |3)° |4)° [5)° |6)° 7)" 


(a Sa a a a Sn a 
Oo Oo Oy OS Ss St Sh DS 
oe ee 
Fe ioo7o co CO COC O&O 


Note that that the “exponent 3” is needed mainly in the encoded form, since an 
integer representation for a CBS does not disclose its tensor order (3) to the reader, 
while the other representations clearly reveal that the context is three-qubits. 


The Channel Labels. We will use the same labeling scheme as before, but more 
input lines means more labels. For three lines, we would name the registers A, B and 


o2t 


C’, as in the hypothetical circuit 


A-register in A-register out 
U 

B-register in B-register out 

C-register in H C-register out 


Working with Three Qubits 


As I mentioned in the introduction, the current regime allows three or more inputs as 
long as we only apply operators to two at-a-time. Let’s look at a circuit that meets 
that condition. 


eof DEAS 


Y Y Y 


P Q R 


This circuit is receiving an order-3 tensor at its inputs. The first two registers, A and 
B, get a (potentially) entangled bipartite state |)? and the third, C, gets a single 
qubit, |y). We analyze the circuit at the three access points, P, Q and R. 


A Theoretical Approach. We'll first do an example that is more general than 
we normally need, but provides a surefire fallback technique if we are having a hard 
time. We’ll give the input states the most general form, 


and 


IY) = 


AR DWR 


328 


Access Point P. The initial tripartite tensor is 


Ib)" le) = 


ARDW&. 


Access Point Q. The first gate is a CNOT applied only to the entangled |w)’, 
so the overall effect is just 


(cNoT@ 1) (lv) ly)) = (CNOT |)? ) ® ly) 


by the rule for applying a separable operator to a separable state. (Although the first 
two qubits are entangled, when the input is grouped as a tensor product of Hi) @H, 
ws)” ® |y) is recognized as a separable second-order tensor.) 


Applying the CNOT explicitly, we get 


(CNOT|¥)?) ® Iv) 


“(0 


I| 
Q 
Z 
© 
ial 
AR WO 
& 
a ™ 
3 
NLS 
I| 
2 ADO 


We needed to multiply out the separable product in preparation for the next phase 
of the circuit, which appears to operate on the last two registers, B and C. 


Access Point R. The final operator is local to the last two registers, and takes 
the form 


Although it feels as though we might be able to take a short cut, the intermediate 
tripartie state and final operator are sufficiently complicated to warrant treating it as 


329 


a fully entangled three-qubit state and just doing the big matrix multiplication. 


an 1 2 2 2 0 O: 0 “O\ fan 
at tS: 2 Se Or I: Sto eae 
Bn Sh SS GE > 0: |) oe 
eo] |6é) . 1) 1 -1 -1 1 £0 0 O Of] ge 
ee Se get hee ot ae ale 
6g Oe SOs 60. Oe Sh) ea. et i ee 
yn Om 10. Be Oe ot Le =f 1 vn 
ve Oh 20> AG. : SOP AES <r. Sie yy Ne 

Gey POs =e Bi a BE 

Oh: Gk: aby Be 

Oty OE: pax BS 

=, EOI = Ob = Bip ae BE 

7g Vi pi) a a een oa 

ON S05 Ey = VE 

ON SOG SSE 

ON S05 = eye 


which could be written as the less compact basis expansion 


5 | (an +a€ + 8+ BE) |000) + (an ~a€ + Bn — BE) |001) 


2 
(ay Fag =-0 = PE). |010). + (an G6 =a BE).|011) 
(Op OG oe VE): LOO): a Cone BEE ig = ye) 10D) 


(Op Os =p ye): | LO ae 0 = 0G Ay te 9G) hes 


This isn’t very enlightening because the coefficients are so general. But a concrete 
example shows how the process can be streamlined. 

A Practical Approach. Usually we have specific and nicely symmetric input 
tensors. In this case, let’s pretend we know that 

00) + {11 
|b)? = | ) | ) and 
V2 

Le, 


ly) 


330 


For reference, I’ll repeat the circuit for this specific input. 


}00) + |11) +{ I 
v2 H 


|1) H 


Access Point P. We begin in this state 


(io ST) i 


Access Point Q. Apply the two-qubit CNOT gate to registers A and B, which 
has the overall effect of applying CNOT © 1 to the full tripartite tensor. 


(CNOoT®1) wy" |y)) = (CNOT Iw)? Je (1 iv) ) 


( 
= ia ae ty ) ® (1 11) ) 
vem 


CNOT |00) + CNOT |11) 
® |1) 


a state that can be factored to our advantage: 


|001) -++ |101) _ |0) =f |1) 
a ( 5 on) 


Access Point R. Finally we apply the second two-qubit gate H®? to the B and 
C registers, which has the overall effect of applying 1 @ H®? to the full tripartite 
state. The factorization we found makes this an easy separable proposition, 


[1 @ H®| Gra @ 1) = (Ose) @ H®\01) . 


331 


Referring back to the second order Hadamard on the two states in question, i.e., 


(OO)? 2c (JOEY ee RO IES 


H®? \01) ; | 


we find that 


[1 oH” (es g )) 


Bs (Os) (™ — |01) : 10) — 0) 


which can be factored into a fully separable 


(e zi =) (* + m (e i =) . 
V2 v2 V2 
On the other hand, if we had been keen enough to remember that H®? is just the 


separable H @ H, which has a special effect on a separable state like |01) = |0) |1), 
we could have gotten to the factored form faster using 


92] (10) + |1) _ (0) + [1) 
j1@H | (75,2 @ ion) =i [H @ H] (0) |1)) 
|0) + |1) 


= {E> ® #0) @ Hf) 


_ (2 a e) (2 a Pe (2 = e) 
V2 V2 V2 
The moral is, don’t worry about picking the wrong approach to a problem. If your 
math is sound, you'll get to the end zone either way. 


Double Checking Our Work. Let’s pause to see how this compares with the 
general formula. Relative to the general |2))” and |), the specific amplitudes we have 
in this example are 


a=éo= 


which causes the general expression 


5 | (an +a€ + B+ BE) 000) + (an —a€ + bn — 86) [001) 


2 
(an + a€ — Bn — BE) |010) + (an —a€ — Bn + BE) |011) 
(Onaeg - aVe). (LOO): a On OE ey Fe) | LOT) 


Conroe = Wy Se) (MIO): ee oy =e = Fyre Ve) Ia) 


332 


to reduce to 
5 [ (ag) |000) + (ag) J001) + (ag) [010) + (—08) |011) 


+ (6€) |100) + (-8€) |101) + (6g) |110) + (—6¢) |111) | 


1 1 


1 1 
| (= jo00) — —, joo1) + , |o10) — J011) 


V2 


+ 


1 
2 


I 
x 


Z 


1 
FB |100) 


1 
201) 4 
sz |o1) 


1 
Fi 110) 


1 


i j111) |. 


A short multiplication reveals this to be the answer we got for the input (Sg) \1), 
without the general formula. 


[Exercise. Verify that this is equal to the directly computed output state. Hint: 
If you start with the first state we got, prior to the factorization, there is less to do.| 


[Exercise. This had better be a normalized state as we started with unit vectors 
and applied unitary gates. Confirm it.] 


The Born Rule for Tripartite Systems 


The Born rule can be generalized to any dimension and stated in many ways. For 
now, let’s state the rule for an order-three Hilbert space with registers A, B and C, 
and in a way that favors factoring out the AB-registers. 


Trait #15’ (Born Rule for Tripartite States): If we have a tripartite state 
that can be expressed as the sum of four terms 


le) = |0) "1B Wolo — hove IW)o + Bye We) a Ave IWs)a > 


each of which is the product of a distinct CBS ket for #4 ® Hg and some general 
first order (typically un-normalized) ket in the space Hc, 


ee Wado ; 


then if we measure the first two registers, thus forcing their collapse into one of the 
four basis states, 


{ |0)4 ’ big ’ Aye ’ Biya) 2 


the C' register will be left in a normalized state associated with the measured CBS 


333 


ket. In other words, 


A®B 


A®B 


A®B 


A®B 


‘Nv 


‘\ 


CN 


V1 


(1 


v1) 


Wo 


(wo 
V3 


(ws 


bs) 


Note the prime (’) in the tripartite Trait #415’, to distinguish this from the un- 
primed Trait #15 for bipartite systems. Also, note that I suppressed the state-space 


subscript labels A, B and C' which are understood by context. 


We'll use this form of the Born rule for quantum teleportation in our next lecture. 


11.8.4 End of Lesson 


This was a landmark week, incorporating all the basic ideas of quantum computing. 
We are now ready to study our first quantum algorithms. 


334 


Chapter 12 


First Quantum Algorithms 


12.1 Three Algorithms 


This week we will see how quantum circuits and their associated algorithms can be 
used to achieve results impossible in the world of classical digital computing. We’ll 
cover three topics: 


e Superdense Coding 
e Quantum Teleportation 


e Deutsch’s Algorithm 


The first two demonstrate quantum communication possibilities, and the third pro- 
vides a learning framework for many quantum algorithms which execute “faster” (in 
a sense) than their classical counterparts. 


12.2 Superdense Coding 


12.2.1 Sending Information by Qubit 


ly) =al0) + Bl), 


seems like it holds an infinite amount of information. After all, a and 6 are complex 
numbers, and even though you can’t choose them arbitrarily (|a|? + |3|? must be 1), 
the mere fact that @ can be any complex number whose magnitude is < 1 means it 
could be an unending sequence of never-repeating digits, like 0.4193980022903... . 
If a sender & (an assistant quantum researcher named “Alice” ) could pack |q) with 
that a@ (and compatible 3) and send it off in the form of a single photon to a receiver 
B (another helper whose name is “Bob” ) a few time zones away, &@ would be sending 
an infinite string of digits to @ encoded in that one sub-atomic particle. 


335 


The problem, of course, arises when the when &¥ tries to look inside the received 
state. All he can do is measure it once and only once (Trait #7, the fifth postulate 
of QM), at which point he gets a “0” or “1” and both a and £ are wiped off the face 
of the Earth. That one measurement tells & very little. 

[Exercise. But it does tell him something. What?| 

In short, to communicate jal, < would have to prepare and send an infinite 
number of identical states, then @ would have to receive, test and record them. Only 
then would ¥ know |a| and || (although neither a nor 3). This is no better than 
classical communication. 


We have to lower our sights. 


A Most Modest Wish 


We are wondering what information, exactly, # (Alice) can send # (Bob) in the 
form of a single qubit. We know it’s not infinite. At the other extreme is the most 
modest super-classical capability we could hope for: two classical bits for the price 
of one. I think that we need no lecture to affirm the claim that, in order to send a 
two-digit binary message, i.e., one of 


0 = “90”, 
T= OP 
D> ee ey, 
2 es 


we would have to send more than one classical bit — we’d need two. Can we pack at 
least this meager amount of classical information into one qubit with the confidence 
that & would be able read the message? 


12.2.2 The Superdense Coding Algorithm 


We can, and to do so, we use the four Bell states (EPR pairs) from the two qubit 
lecture, 


Spo) <0 - 1) 
a) - SN 
ee 00) 5 11) “ne 
Ce = 01) ; 10) . 


Building the Communication Equipment 


& and & prepare the state |Go9). (This can be done, for example, by sending a |(00) 
through the BELL gate, 


|0) H 
I \ |Boo) 
|0) 


WD 


as we learned.) & takes the A register of the entangled state |Go9) and & takes the 
B register. & gets on a plane, placing his qubit in the overhead bin and travels a 
few time zones away. This can all be done long before the classical two-bit message 
is selected by &, but it has to be done. It can even be done by a third party who 
sends the first qubit of this EPR pair to & and the second to &. 


Defense of Your Objection. The sharing of this qubit does not constitute 
sending more than one qubit of information (the phase yet to come), since it is 
analogous to establishing a radio transmission protocol or message envelope, which 
would have to be done even with classical bits. It is part of the equipment that & 
and ./ use to communicate data, not the data itself. 


Notation. In the few cases where we need it (and one is coming up), let’s build 
some notation. When a potentially entangled two-qubit state is separated physically 
into two registers or by two observers, we need a way to talk about each individual 
qubit. We’ll use 


|)? . for the A register (or .&’s) qubit, and 


|e)? 2 for the B register (or #’s) qubit. 


Note that, unless |q)* happens to be separable — and |{9) is clearly not — we will be 
faced with the reality that 


@ |W) 


[Note. This does not mean that the A register and B register can’t exist in physically 
independent locations and be measured or processed independently by different ob- 
servers. As we learned, one observer can modify or measure either qubit individually. 
What it does mean is that the two registers are entangled so modifying or measuring 
one will affect the other. Together they form a single state. ] 


With this language, the construction and distribution of each half of the entangled 
|Go0) to H and Z can be symbolized by 


——> |foo) . goes to & 


|G00) —> 


————> |G) | goes to BZ 
B 


337 


of Encodes the Message 


When .# is ready to ship one of the four bit strings to &, she decides — or is informed 
~— which it is to be and takes the following action. She submits her half of the bipartite 
state to one of four local gates according to the table 


of to Send || & Applies | Equivalent Binary Gate | New Bipartite State 
“00” (nothing) 1@l Boo) 
MOL" xX X@il Bo1) 
7g? Z Z@1 Bro) 
oi iY iY @1 Br) 


(The “@1”s in the equivalent binary gate column reflect the fact that & is not touch- 
ing his half of |9), which is effectively the identity operation as far as the B register 
is concerned.) 


And how do we know that the far right column is the result of .’s local operation? 
We apply the relevant matrix to |Go9) and read off the answer (see section Four Bell 
States from One in the two qubit lecture). 


Compatibility Note. Most authors ask & to apply Z if she wants to encode 
”01” and X if she wants to encode ” 10”, but doing so results in the state |G19) for 
”01” and |So,) for ” 10”, not a very nice match-up and is why I chose to present the al- 
gorithm with those two gates swapped. Of course, it really does’t matter which of the 
four operators .Y uses for each encoding, as long as & uses the same correspondence 
to decode. 


If we encapsulate the four possible operators into one symbol, SD (for Super 
Dense), which takes on the proper operation based on the message to be encoded, 
fs job is to apply the local circuit, 


\S0)|, —{SD}— [(SD@1) |) | 


A 


Notice that to describe ’s half of the output state, we need to first show the full 
effect of the bipartite operator and only then restrict attention to &’s qubit. We 


cannot express it as a function of &’s input, |g) | , alone. 
A 


& Sends Her Qubit 


The message is now encoded in the bipartite state, but for @ to decode it, he needs 
both qubits. < now sends her qubit to Z&. 


338 


| (spat )|o0) | 


A 


A > # 


Z# Measures Along the Bell Basis to Read the Message 
When he gets .’s qubit, @ has the complete entangled bipartite state 
| (SD @ 1) |Bvo) | 


(SD @1) |So0) , 


| (SD @1) [00 | 


B 


so he can now measure both qubits to determine which of the four Bell states he has. 
Once that’s done he reads the earlier table from right-to-left to recover the classical 
two-bit message. 


Refresher: Measuring Along the Bell Basis. Since this is the first time we 
will have applied it in an algorithm, [ll summarize one way that @ can measure 
his entangled state along the Bell basis. When studying two qubit logic we learned 
that to measure a bipartite state along a non-standard basis (call it C), we find the 
binary operator that takes the z-basis to the other basis, call it S, and use St prior 
to measurement: 


(some C-basis state) —> sy measure along z-basis 


In this situation, S is BELL, whose adjoint we computed in that same lecture, 


—— a= @ H 
BELL' ae 


Adding the measurement symbols (the “meters”) along the z-basis, circuit becomes 


Zk 
(one of the four BELL states) —> BELL'* 
= z 


339 


In terms of matrices, 4 subjects his two-qubit state to the matrix for BELL (also 
computed last time), 


i Or 90. G 
0s A. 
Te te, Fe 
ae v2}1 0 0 -1 
0 1-1 


Bob’s Action and Conclusion. Post-processing with the BELL" gate turns 
the four Bell states into four z-CBS kets; if @ follows that gate with a z-basis mea- 
surement and sees a “01”, he will conclude that he had received the Bell state |(o1) 
from /, and likewise for the other states. So his role, after receiving the qubit sent 
by &, is to 


1. apply BELL’ to his two qubits, and 


2. read the encoded message according to his results using the table 


Z measures || A Concludes /’s Message to him is 
“00” "00" 
wee nod? 
P10” “10” 
et a 


In other words, the application of the BELL! gate allowed ¥ to interpret his z-basis 
measurement reading “xy” as the message, itself. 


The following exercise should help crystallize the algorithm. 


[Exercise. Assume & wants to send the message “11.” 


i) Apply 7Y @ 1 to |@o0) and confirm that you get |(11) out. 


ii) Multiply the 4 x 4 matrix for BELL" by the 4 x 1 state vector for |8,,) to show 
that @ recovers the message “1.” 


A Circuit Representation of Superdense Coding 


We can get a circuit for the overall superdense coding algorithm by adding some new 
notation. 


Classical Wires. Double lines (=) indicate the transfer of classical bits. We use 
them to move one or more ordinary digits within a circuit. 


340 


Decisions Based on Classical Bits. We insert a dot symbol, e, into a classical 
line to indicate a general controlled operation, based on the content of that classical 
data ({1] means apply an operator, [0] means don’t). 


Noiseless Transmission Between Communicators. To indicate the (typ- 
ically radio) transmission of either quantum or classical data between sender and 
receiver, we use the wavy line, ~~. 


With this notation, the superdense coding circuit can be expressed as: 


Z — > 
A oo 
A-reg: |Boo) || {SD} ~~ Al le) 
BELL! B 
B-reg: |Bo0) | Aly) 


The notation tells the story. <& uses her two-bit classical message “xy” (traveling on 
the double lines) to control (filled circles) which of the four operations (SD = 1, X, 
Z or iY) she will apply to her qubit. After sending her qubit to 4, A measures both 
qubits along the Bell basis to recover the message “xy” now sitting in the output 
registers in natural z-basis form |) |y). 


[Exercise. Measurement involves collapse and uncertainty. Why is & so certain 
that his two measurements will always result in a true reproduction of the message 
“ry” sent by <? Hint: For each of the four possible messages, what bipartite state 
is he holding at the moment of measurement’?| 


This can actually be tightened up. You’ve seen several unary operator identities 
in the single qubit lecture, one of which was XZ = —iY. A slight revision of this 
(verify as an exercise) is 


which enables us to define the elusive SD operation: we place a controlled-X gate and 
controlled-Z gate in the A-channel under .&%’s supervision. Each gate is controlled 
by one of the two classical bits in her message. They work just like a quantum 
Controlled-U gate, only simpler: if the classical control bit is 1, the target operation 
is applied, if the bit is 0, it is not. 


x —_ 
A Yyoo—* ——~F 
A-reg: (Boo) | XHZe-w~ A} |x) 
BELL't B 
B-xeg: |Bo0) | A) ly) 


341 


For example, if both bits are 1, both gates get applied and result in in the desired 
behavior: “11” > ZX iY. 


(Exercise. Remind us why the gates X and Z appear reversed in the circuit 
relative to the algebraic identity iY = 7X.| 


The Significance of Superdense Coding 


This technique may not seem tremendously applicable considering its unimpressive 
2-bit to 1-bit compression, but consider sending a large classical message, even one 
that is already as densely compressed as classical logic will allow. This is a 2-to-1 
improvement over the best classical technique when applied to the output of classical 
compression. The fact that we have to send lots of entangled Bell states before our 
message takes nothing away from our ability to send information in half the time (or 
space) as before. 


12.3. Quantum Teleportation 


We re-enlist the help of our two most excellent researchers, Alice and Bob, and con- 
tinue to refer to them by their code names & and &. 


In superdense coding @ sent Z one qubit, |e) = a|0) + 8|1), in order to recon- 
struct two classical bits. Quantum teleportation is the mirror image of this process. & 
wants to send & the qubit, |q), by sending him just two classical bits of information. 


Giving Teleportation Context 


You might ask why she doesn’t simply send & the one qubit and be done with it. 
Why be so indirect and translate the quantum information into classical bits? There 
are many answers, two of which I think are important. 


1. As a practical matter, it may be impossible, or at least difficult, for @ to 
send & qubit information due to its instability and/or expense. By contrast, 
humans have engineered highly reliable and economical classical communication 
channels. Sending two bits is child’s play. 


2. Sending the original qubit rather than two classical bits is somewhat beside the 
point. The very fact <& can get the infinitely precise data embedded in the 
continuous scalars a and § by sending something as crude as an integer from 0 
to 3 should come as unexpectedly marvelous news, and we want to know why 
and how this can be done. 


342 


Caveats 


There is the usual caveat. Just because @ gets the qubit doesn’t mean he can know 
what it is. He can no more examine its basis coefficients than & (or anyone in her 
local lab who didn’t already know their values) could. What we are doing here is 
getting the qubit over to 4’s lab so he can use it on his end for any purpose that & 
could have (before the teleportation). 


And then there’s the unusual caveat. In the process of executing the teleportation, 
& \oses her copy of |q). We’ll see why as we describe the algorithm. 


An Application of the Born Rule 


I like to take any reasonable opportunity to restate important tools so as to establish 
them in your mind. The Born rule is so frequently used that it warrants such a review 
before we apply it to teleportation. 


The Born rule for 3-qubit systems (Trait #415’) tells us (and I am paraphrasing 
with equally precise expressions) that if we have a tripartite state which can be 
expressed as the sum of four terms (where the first factors of each term are AB-basis 
kets): 


le) = Ons Wola + pie IWi)e + Bee We) 1 ve IW3)a > 


then an AB-register measurement along the natural basis will force the corresponding 
C-register collapse according to 


ABB Sy 035° = ae a 
) 


A@®B\, |I%, = cy, Wie 
evel 


etc. 


There are two consequences that will prepare us for understanding quantum tele- 
portation as well as anticipating other algorithms that might employ this special 
technique. 

Consequence #1. The rule works for any orthonormal basis in channels A and 
B, not just the natural basis. Whichever basis we choose for the first two registers A 
and B, it is along that basis that we must make our two-qubit measurements. So, if 
we use the Bell basis, { |@j.) }, then a state in the form 


ley? = \Bo0) ap IYoodc + |601) 4p |Yo1)c 
+ |P10) a8 |Yioo + [Bi)ap ude. 


343 


when measured along that basis will force the corresponding C-register collapse ac- 
cording to 


A®B \ |Bo0) ap —- C Se _oode 


A®B \ |Bo1) ap = C YY _ Wore 


etc. 


This follows from the Trait ##7, Post-Measurement Collapse, which tells us that AB 
will collapse to one of the four CBS states — regardless of which CBS we use — forcing 
C into the state that is glued to its partner in the above expansion. 


The division by each || |q;,) || (or, if you prefer, \/ (Wx | Yjx) ) is necessary because 
the overall tripartite state, |v)’ can only be normalized when the |1);,) have non-unit 
(in fact < 1) lengths. 


[Exercise. We already know that |3;,) are four normalized CBS kets. Show that 
if the |~j,) were normal vectors in Hc, then |y)* would not be a normal vector. Hint: 
Write down *(y|y)? and apply orthonormality of the Bell states.] 


Consequence #2. If we know that the four general states, |q,;,), are just four 
variations of a single known state, we may be able to glean even more specific in- 
formation about the collapsed C-register. To cite the example needed today, say we 
know that all four |~,;,) use the same two scalar coordinates, a and (3, only in slightly 
different combinations, 


7 ye (cle + ete fete (2% + “le) 


+ VBdas (Te FBC) + oadgy (Alte ale) 


(Each denominator 2 is needed to produce a normal state |y)°; we cannot absorb 
it into a and £, as those scalars are fixed by the normalized |w) to be teleported. 
However, the Born rule tells us that the collapse of the C-register will get rid of this 
factor, leaving only one of the four numerators in the C-register.) Such a happy state- 
of-affairs will allow us to convert any of the four collapsed states in the C-register to 
the one state, 


a|0) + B{1) 
by mere application of a simple unary operator. For example, if we find that AB 
collapses to |99) (by reading a “00” on our measuring apparatus), then C’ will have 


already collapsed to the state a|0) + 6|1). Or, if AB collapses to |3,,) (meter 
reads“11”), then we apply the operator 7Y to C to recover a|0) + {|1), because 


i¥(-8|0) + efl)) = é i) Co) 7 (3) 


You'll refer back to these two facts as we unroll the quantum teleportation algorithm. 


344 


12.3.1 The Quantum Teleportation Algorithm 


We continue to exploit the EPR pairs which I list again for quick reference: 


ep ee 
a) - Sa 
By) = Ce and 
a 01) — |10) 


Building the Communication Equipment 


& and & prepare — and each get one qubit of — the bipartite state |Go9). 


& Produces a General Qubit for Teleportation 


When .& is ready to teleport a qubit to 4, she manually produces — or is given — any 
H-space qubit 


I) = Deo = A/D o+ Blo. 


The subscript C' indicates that we have a qubit separate from the two entangled 
qubits already created and distributed to our two “messagers,” a qubit which lives in 
its own space with its own (natural) CBS basis {|0),, |1)¢ }. 


By tradition for this algorithm, we place the C-channel above the A/B-Channels: 


Io — register C' 
Boo) register A 
pean register B 


The algebraic state of the entire system, then, starts out to be 


0) 410) 5 + pals) 
B 


Je = Welds = (alc + Bild) @( 


The Plan 


A starts by teleporting a qubit to @. No information is actually sent. Rather, <& does 
something to her entangled qubit, |Go0 which instantaneously modifies 4’s qubit, 


| Goo) 


) ee 
po taster than the speed of light. This is the meaning of the word teleportation. 


345 


She then follows that up by taking a measurement of her two qubits, getting two 
classical bits of information — the outcomes of the two register readings. Finally, she 
sends the result of that measurement as a classical two-bit message to & (sorry, we 
have to obey the Einstein’s speed limit for this part). Z will use the two classical bits 
he receives from Alice to tweak his qubit (already modified by Alice’s teleportation) 
into the desired state, |~). 


& Expresses the System State in the Bell Basis (No Action Yet) 


In the z-basis, all the information about |~) is contained in /’s C-register. She 
wants to move that information over to #’s B-register. Before she even does anything 
physical, she can accomplish most of the hard work by just rearranging the tripartite 
state |y)* in a factored form expanded along a C'A Bell-basis rather than a CA 
z-basis. In other words, we’d like to see 


4 
ley? = |Boodca lYooip + |o1dc4 |¥o1)p 
+ |Pro)ca io)p + IBitdca lYude 5 
where the |), are (for the moment) four unknown B-channel states. We can 


only arrive at such a C'A Bell basis expression if the two channels A and C’ become 
entangled, which they are not, initially. We’ll get to that. 


In our short review of the Born rule, above, I gave you a preview of the actual 
expression we’ll need. This is what we would like/wish/hope for: 


ip)? = [Boole (oes “Me - \Bor)on (20 : as) 


Boker (MAS PWe) 4 pinto (—Adla + ale 


+ 


2 


Indeed, if we could accomplish that, then < would only have to measure her two 
qubits along the Bell basis, forcing a collapse into one of the four Bell states and 
by the Born rule collapsing #’s register into the one of his four matching states. A 
glance at the above expressions reveals that this gets us is 99.99% of the way toward 
placing |W) into 4’s B-register, i-e., manufacturing |~),, a teleported twin to Alice’s 
original |) ,. We'll see how & gets the last .01% of the way there, but first, we prove 
the validity of the hoped-for expansion. 


We begin with the desired expression and reduce it to the expression we know 
to be our actual starting point, |y)®. (Warning: After the first expression, I’ll be 


346 


dropping the state-space subscripts A/B/C and letting position do the job.) 


Von (Me + 2) 4 ajo, (Pidte + als 


+ inden (22212) 4 jpg, (ide + olde) 


}00) + |11) ( = ae) _  |01) + |10) Ger i 0) 
J/2 D) | J2 D) 


_ 100) —|11) fal0) — BI) |  [01)—|10) (6/0) + @|1) 
me eae) Se ae 


c |000) + a |110) + 8 |001) + 8 |111) 


1 
- xa 
+ 8|010) + 6/100) + a |011) +a |101) 
+ @|000) — a |110) — 6001) + 8 |111) 


— 8010) + 8 |100) + @|011) — a|101) ) 


Half the terms cancel and the other half reinforce to give 


ro (oe + ae wie: (21 + ole) 


+ Vidog (TBS EBAY + Ifurgy (Hide t ele) 


= (20.|000) +2 |100) + 2a |011) + 28111) ) 


3 


= <3 (210) + 811)) 100) + <3 (210) +811)) la) 
po) 


2 (ao) + 811)) ( 3 


= Ibo |200) ap > 


a happy ending. This was .’s original formulation of the tripartite state in terms of 
the z-basis, so it is indeed the same as the Bell expansion we were hoping for. 


Next, we take action to make use of this alternate formulation of our system state. 
(Remember, we haven’t actually done anything yet.) 


347 


o& Measures the Registers CA Along the Bell Basis 


The last derivation demonstrated that 


IY)c 1800) ap 


= fade (2MBEEBEY + hyde, (202+ ela) 


+ Hiden (202 = Fey 4 peatoy (Alda t ale) 


that is, the to-be-teleported |y) in Alices’s C-register seems to “shows up” (in a 
modified form) in 4’s B-register without anyone having taken any action — all we 
did was rearrange the terms. However, this rearrangement is only valid if @ intends 
to measure the AC -register along the BELL basis. Such a measurement, as we have 
seen, always has two parts. 


1. & applies a BELL! gate to her AC-registers (the operator that takes the non- 
standard basis to the z-basis). 


2. & measures the resultant AC registers in a standard z-basis. 


The first of these two parts, which effects the instantaneous transfer of |W) from the 
A-channel to the B-channel, corresponds to the teleportation step of “the plan.” The 
second part is where &’s measurement selects one of the four “near”-|y)s for 4’s 
B-register. 


The circuit <& needs is 


register-C_ ——] A 


BELL' y 
register-A ©——+ A 


or more explicitly, 


register-C’ ry H ea 


aD 
W 
A 


register-A 


348 


After applying the gate (but before the measurement), the original tripartite state, 
ly), will be transformed to 


(BELL @1) |v)? = BELL |6q0)o, ® 1( 


05 + al) 
+ BELL'|6n)o, ® 1 (“es | is) 


a|0g + Bp 
— 


+ BELL |Gi0)o, @ 7¢ cau ea pie) 
)p 


BELL |611) 4 @ 1(= “at a {1 s 


= Ojo, (M+ Pe) 5 oy, (21a be) 


nije, (22a = 2s) 4 yo, (PMe tela) 


Now when & measures her two qubits along the z-basis, she will actually be measuring 
the pre-gate state along Bell basis. After the measurement only one of the four terms 
will remain (by collapse) and # will have a near-|w) left in his B-register. 


The circuit that describes </’s local Bell basis measurement with 4’s qubit going 
along for the ride is 


Ib) .— 4 A 
Boo) | & A 
(B00) | 


where the final meters are, as always, natural z-basis measurements. 


One Qubit of an Entangled Pair 


In case you hadn’t noticed, twice today we’ve seen something that might seem to be 
at odds with our previous lessons. I’m talking about a single, entangled, qubit being 
fed into one channel of a binary quantum gate, like 


Iw) o—— | A 


va 


YD 


|Boo) r q 


349 


or 


| 500) . KZ ied 
BELL! 
B00) | A 


Earlier, I admonished you to not expect to see — or be allowed to write — a non- 
separable bipartite state broken into two parts, each going into the individual channels 
of a binary gate. Rather we need to consider it as a non-separable entity going into 
both channels at once, as in: 


wy? > 


However, the new notation that I have provided today, one half of an entangled qubit, 


U |p)? 


| | 
| | 
| | 
| | 
| | 
Y Y 
P Q 


|)? i for the A register (or &’s) qubit, and 


|)? : for the B register (or #’s) qubit,. 


allows us to write these symbols as individual inputs into either input of a binary 
quantum gate without violating the cautionary note. Why? Because earlier, the 
separate inputs we disallowed were individual components of a separable tensor (when 
no such separable tensor existed). We were saying that you cannot mentally place a 
tensor symbol, ®, between the two individual inputs. Here, the individual symbols 
are not elements in the two component spaces, and there is no danger of treating 
them as separable components of a bipartite state, and no ® is implied. 


&@ Sends Her Measurement Results to Z 
& now has a two (classical) bit result of her measurement: “ry” = “00”, “01”, “10” 


or ,“11.” She sends “xy” to @ through a classical channel, which takes time to get 
there. 


390 


Z Uses the Received Message to Extract |v) from His Qubit 


¥’s qubit is in one of the four collapsed states 


a0), + Bll), 
B|0), + al), 
a|0)_ — Bll), 
—B|0)p + @|1)p 


This happened as a result of /’s Bell basis measurement (instantaneous faster-than- 
light speed transfer, ergo “teleportation” ). That’s the 99.99% I spoke of earlier. To 
get the final .01% of the way there, he needs to look at the two classical bits he 
received (which took time to reach him). They tell him which of those four states 
his qubit landed in. If it’s anything other than “00” he needs to apply a local unary 
operator to his B-register to “fix it up,” so it will be in the original |v). The rule is 


B Receives || 4 Applies | A Recovers 
“00” (nothing) |) 
“Or” x 1) 
10" Z 1) 
We 1y \w) 


Nothing helps make an abstract description concrete like doing a calculation, so I 
recommend an... 


[Exercise. 


i) 


Express the operator BELL! @ 11 as an 8 x 8 matrix with the help of the section 
“The Matrix of a Separable Operator,” in the lesson on tensor products. 


Express the state |y)? = (a@|0) + 81) ) |Go0) as an 8 x 1 column vector 
by multiplying it out (use this initial state description, not the “re-arranged” 
version that used the Bell basis). 


Multiply the 8 x 8 operator matrix by the 8 x 1 state vector to get &/’s output 
state (prior to measurement) in column vector form. 


Expand the last answer along the z-basis. 


Factor that last result in such a way that it looks the same as the answer we 
got when we applied BELL! @ 1 to the “re-arranged” version of ly)°. | 


In the case where & receives the classical message “1/0” from <&, apply the 
corresponding “fix-it-up operator” shown in the table to his collapsed qubit and 
thereby prove that he recovers the exact teleported state |W) = a|0) + 6 |1). 


dol 


A Circuit Representation of Quantum Teleportation 


Just as we used “SD” to be one of four possible operators in the superdense coding 
algorithm, we will use “QT” to mean one of four operators (1, X, 7, i1Y) that A must 
apply to his register based on the message he receives from .%. With this shorthand, 
the quantum teleportation circuit can be expressed as: 


Iw) *— AAA 
Boo) | © A ~ 
Boo) |, QT}— |) 


The circuit says that after taking the measurements (the meter symbols), & “radios” 
the classical data (double lines and wavy lines) to @ who uses it to control (filled 
circles) which of the four operations he will apply to his qubit. 


Once again, we use the identity 


to more precisely define the QT operation. We place a controlled-X and controlled-Z 
gates into into @’s pipline to get an improved circuit description. 


I) H i 
Boo) | AR 7 
Boo) |, xX+{z}— |) 


(Don’t forget that operators are applied from left-to-right in circuits, but right-to-left 
in algebra.) 

Many authors go a step further and add the initial gate that creates the AB- 
channel Bell state |Go9) from CBS kets: 


|0) H 
I \ |Bo0) 
|0) 


302 


which leads to 


C: |W) *— H Aw 
A: |0) H\}—*t+6 pa ] 
B: |0) © 5 a 2) 
Here is the circuit with access points marked for an exercise: 
C: |) oH ARS 
A: |0) H\-- ram Lk wh 7 
B: |0) © XK Z |) 
Y y v y y v 
P Q R S T U 
Observe that the tripartite state 
Iv) = |b) 10) 410) 


going into the entire circuit is transformed by various gates and measurements along 
the way. It continues to exist as a tripartite state to the very end, but you may not 
recognize it as such due to the classical wires and transmission of classical information 
around access points R and S, seemingly halting the qubit flow to their right. Yet 
the full order-3 state lives on. It is simply unnecessary to show the full state beyond 
that point, because registers C and A, after collapse, will contain one of the four CBS 
kets, |v). |y) 4, for zy = 00,01,10 or 11. But those two registers never change after 
the measurement, and when Bob applies a measurement to his local register B, say 
zY perhaps, he will be implicitly applying the separable operator 1 @ 1 @7Y to the 
full separable tripartite state. 


[Exercise. Using natural coordinates for everything, compute the state of the 
vector |y)° as it travels through the access points, P-U: |y)®, roe Ieye, le), 
|v)? and |e)s. For points S, T and U you will have to know what measurement / 
reads and sends to &, so do those three points twice, once for a reading of CA =“01” 
and once for a reading of CA =“11”. HINT: Starting with the easy point P, apply 
transformations carefully to the basis kets using separable notation like (1® BELL) 
or (BELL' @ 1). When you get to post-measurement classical pipes, apply the Born 
Rule which will select exactly one term in the sum. | 


393 


Why Teleportation Works. 


Consider the main steps in the teleportation algorithm. We begin with three channels, 
the first of which contains all the quantum information we want to teleport, and the 
last two none of it, 


Iv)” = |)c 1600) ap - 


Once we have the idea to entangle channels A and C' by converting to the Bell basis 
(perhaps driven by the fact that one of the Bell states is in the AB register pair) we 
end up with a state in the general form, 


ley? =~ |Boodoa Yoo)p + |o1dca |¥o1)p 
+ |Frodca Wio)p + |Bidea Yu) zB - 


Without even looking at any of four |w;,) kets in the B-channel, we are convinced 
that 100% of the |v) information is now sitting inside that register, waiting to be 
tapped. Why? 


The reason is actually quite simple. 


Quantum gates — including basis transformations — are always unitary and thus 
reversible. If the Bell-basis operator had failed to transfer all the |w) information into 
the B-register, then, since none of it is left in the AC-registers (they contain only 
Bell states), there would be no hope of getting an inverse gate to recover our starting 
state which holds the full |W). Thus, producing an expression that left channels A 
and C bereft of any a trace of |w) information must necessarily produce a B-channel 
contains it all. 


12.4 Introduction to Quantum Oracles 


Superdense coding and quantum teleportation may seem more like applications of 
quantum computation than quantum algorithms. They enable us to transmit data 
in a way that is not possible using classical engineering. In contrast, Deutsch’s 
little problem really feels algorithmic in nature and, indeed, its solution provides the 
template for the many quantum algorithms that succeed it. 


We take a short side-trip to introduce Boolean functions then construct our first 
quantum oracles used in Deutsch’s and later algorithms. 


12.4.1 Boolean Functions and Reversibility 


Classical unary gates — of which there are a grand total of four — contain both re- 
versible operators (1 and —) and irreversible ones (the [0|-op and [1]-op). Quantum 
operators require reversibility due to their unitarity, so there are no quantum analogs 


304 


for the latter two. Binary gates provide even more examples of irreversible classical 
operations for which there are no quantum counterparts. 


These are specific examples of a general phenomenon that is more easily expressed 
in terms of classical Boolean functions. 


Boolean Functions. A Boolean function is a function that has one or more 
binary digits (0 or 1) as input, and one binary digit as output. 
From Unary Gates to Boolean Functions 


A classical unary gate takes a single classical bit in and produces a single classical 
bit out. In the language of functions, it is nothing other than a Boolean function of 
one bit, i.e., 


f:{0,1} — {0,1}, 


or using the notation B = {0, 1} introduced in the single qubit lecture, 


f:B — B. 


(As we defined it, B had a richer structure than that of a simple set; it had a mod-2 
addition operation @ that we will find useful in the definitions and computations to 
come.) 


Reversible Example. To avoid becoming unmoored by abstractions, we revisit 
the negation operator in the language of Boolean functions. If we define 


f(z) = 72, forreB, 


then in terms of a truth table, the Boolean function f is 


e | fle) 
0 1 
1 0 


This f is reversible, and in fact is its own inverse, since f(f(x)) = x. 


Irreversible Example. On the other hand, if we define 
gx) = 1, forxcB, 


with a truth table of 


¢ g(x) 
0 1 
1 i 


399 


we have the quintessential example of an irreversible function. 


Boolean functions (classical implied) will now become the subject of study. Com- 
puter science seeks to answer questions about such functions or create Boolean func- 
tions that do useful things. On the other hand, we will be using quantum circuits 
composed of quantum operators to answer questions about these classical Boolean 
functions. In other words, we have not abandoned the classical functions (I'll drop 
the modifier “Boolean” for now) in the least. On the contrary: they are the principle 
players in our narrative. 


Binary Gates as Boolean Functions 


The language naturally extends to functions of more than one input bit. To keep 
things simple, let’s talk about two bits. 


A two bit function (classical and Boolean implied) takes two bits in and produces 
one bit out. In other words, 


1G) Gl) G)y > on 


or in B notation, 


f:Be — B. 


(Column vs. row is not important here, so I’ll use whichever fits better into the written 
page without the ()' baggage.) 


Note that we are avoiding the term “binary,” replacing it instead with “two bit” 
to avoid confusion arising from the fact that we are using binary digits for every input 
slot, whether a unary input or a multi-bit input. 

Irreversibility in the Two (or Greater) Bit Case. Since two-input Boolean 
functions functions, like all Boolean functions, have a single bit out, they are inher- 
ently irreversible; we cannot undo the destruction that results from the loss of one or 
more bits. 


(Necessarily) Irreversible Example. A typical two bit function that, like all 
two+ bit functions, is necessarily irreversible is the XOR, i-e., 


f(z,y) = «x@y, for (Fle B 


with the truth table 


f(x,y) 


Oo FPF FF CO 


306 


12.4.2 The Quantum Oracle of a Boolean Function 


Although our quantum algorithms will use quantum gates, they will often have to 
incorporate the classical functions that are the center of our investigations. But 
how can we do this when all quantum circuits are required to use unitary — and 
therefore reversible — gates? There is a well known classical technique for turning an 
otherwise irreversible function into one that is reversible. The technique pre-dates 
quantum computing, but we’ll look at it only in the quantum context, and if you’re 
interested in the classical analog, you can mentally “down-convert” ours by ignoring 
its superposition capability and focus only on the CBS inputs. 


Oracles for Unary Functions 


Suppose we are given a black bor that computes some unary function, f(x), even one 
that may be initially unknown to us. The term black box suggests that we don’t know 
what’s on the inside or how it works. 


x ft— fle) 


It can be shown that, using this black box — along with certain fundamental quantum 
gates — one can build a new gate that 


e takes two bits in, 

e has two bits out, 

e is unitary (and therefore reversible), 

e computes the function f when presented the proper input, and 


e does so with the same efficiency (technically, the same computational complexity, 
a term we will define in a later lesson), as the black box f, whose irreversible 
function we want to reproduce. 


We won’t describe how this works but, instead, take it as a given and call the new, 
larger circuit “U;,” the quantum oracle for f. Its action on CBS kets and its circuit 
diagram are defined by 


Data register: |x) |x) 
U; 
Target register: |y) ly ® f(x) 


which also gives the name data register to the A-channel and target register to the 
B-channel. 


First, notice that the output of the target register is a CBS; inside the ket we 
are XOR-ing two classical binary values y and f(x), producing another binary value 
which, in turn, defines a CBS ket: either |0) or |1). 


307 


Example. We compute the matrix for Ur when f(x) = 0, the constant (and 
irreversible) [O|-op. Starting with the construction of the matrix of any linear trans- 
formation and moving on from there, 


Uy, = (Us|00), Us|01), Us |10), Uy |11)) 
(|0) |O@ f(0)), |0)|1eFO)), |1) |O@FA)), 11) 116 fA))) 
(10) |f)), IO) 1 FO), 11) 1f@), 1) 1 fG))), 


where we use the alternate notation for negation, @ = 7a. So far, everything we did 
applies to the quantum oracle for any function f, so we’ll put a pin in it for future 
use. Now, going on to apply it to f = [0]-op, 


Utorop = (10)]0), |0)|1), |1)]0), 1) |1)) 
£30202. 
= 010 0 
a 00 1 07’ 
GO <0? 10h SL 
an interesting result in its own right, Ujoop = 1, but nothing to which we should 


attribute any deep meaning. Do note, however, that such a nice result makes it self- 
evident that Uy is not only unitary but its own inverse, as we show next it always 
will be. 


Uy is Always its Own Inverse. We compute on the tensor CBS, and the result 
will be extensible to the entire H ® H by linearity: 


(UyUs) |wy) = Us(Us|ay) ) = Us |z) ly @ f(z)) ) 


= |) |(ves@) © f@)) 


iz) |y@(F@) © f@)) ) = |e) ly) = ley) QED 


[Exercise. Why is f(x) @ f(x) = 0?| 


Us Computes f(x). This is a simple consequence of the circuit definition, 
because if we plug 0 — y, we get 


|x) |x) 
Us 


0) iF(a)) 


[Exercise. What do we get if we plug 1 > y?| 


[Exercise. Compute the quantum oracles for the other three unary functions 
and observe that their matrices reveal them each to be unitary, not to mention self 
inverses. | 


308 


Notice that output of Uy for a CBS is always a separable state 


Iz) @ |ly@flx)), 


since y ® f(x) is always either 0 or 1. 


(Exercise. CBS input kets producing separable (even CBS) output kets do not a 
separable operator make. Prove this by looking at the three (U;)s you computed in 
the last exercise, and finding two which are patently non-separable.| 


Oracles for Functions of Two Boolean Inputs 


Everything extends smoothly to more than one-bit gates, so we only need outline 
the analysis for two-qubits. We are given black box of a two-input Boolean function, 


f (Basi) 


f [7 == f (xo, 1) 


This time, we assume that circuit theory enables us to build a three-in, three-out 
oracle, Uy, defined by 


|Zo) |Zo) 
|z1) Us |z1) 
ly) |y ® f(xo, £1) ) 


usually shortened by using the encoded form of the CBS kets, |xy?, where x € 
{ 0,1, 2,3}, 


Iz)? es 
Us 
ly) ly ® f(x) 


The key points are the same: 


e U; is its own inverse. 


e Uy emulates f(x) by setting y = 0, 
Uy (|e) 0) = ley (F@)) 


[Exercise. Compute the quantum oracle (in matrix form) for the classical AND 
gate.| 


309 


12.5 Deutsch’s Problem 


Our first quantum algorithm answers a question about an unknown unary function 
f(x). It does not find the exact form of this function, but seeks only to answer a 
general question about its character. Specifically, we ask whether the function is 
one-to-one (distinct inputs produce distinct outputs) or constant (both inputs are 
mapped to the same output.) 


Obviously, we can figure this out by evaluating both f(0) and f(1), after which 
we would know the answer, not to mention have a complete description of f. But 
the point is to see what we can learn about f without doing both evaluations of the 
function; we only want to do one evaluation. In a classical world if we only get to 
query f once we have to choose between inputs 0 or 1, and getting the output for our 
choice will not tell us whether the function is one-to-one or constant. 


All the massive machinery we have accumulated in the past weeks can be brought 
to bear on this simple problem very neatly to demonstrate how quantum parallelism 
will beat classical computing in certain problems. It will set the stage for all quantum 
algorithms. 


12.5.1 Definitions and Statement of the Problem 


For this and a subsequent algorithm, we will define a property that a Boolean function 
might (or might not) have. We continue to assume that function means Boolean 
function. 


Balanced and Constant Functions 


Balanced Function. A balanced function is one that takes on the value 0 for exactly 
half of the possible inputs (and therefore 1 on the other half). 


Two examples of balanced functions of two inputs are and XOR and 1, : (a, y) & 


Y: 
(x,y) XOR(x, y) (x,y) 1, (x,y) 
(0, 0) 0 (0,0) 0 
(0, 1) 1 (0,1) 1 
(1, 0) 1 (1,0) 0 
(1, 1) 0 (1,1) 1 


Two unbalanced function of two inputs are AND and the [1]-op: 


360 


(x,y) AND(z,y) (x,y) [A] (x,y) 
(0, 0) 0 (0,0) i 
(0, 1) 0 (0, 1) 1 
(1, 0) 0 (1,0) 1 
(1, 1) 1 (1,1) 1 


Constant Functions. Constant functions are functions that always produce the 
same output regardless of the input. There are only two constant functions for any 
number of inputs: either the [0]-op or the [1]-op. See the truth table for the [1]-op, 
above; the truth table for the [0|-op would, of course, have Os in the right column 
instead of 1s. 


Balanced and Constant Unary Function 


There are only four unary functions. Therefore the terms balanced and constant might 
seem heavy handed. The two constant functions are obviously the [0]-op or the [1]-op, 
and the other two are balanced. In fact, the balanced unary functions already have 
a term that describes them: one-to-one. There’s even a simpler term in balanced 
functions in the unary case: not constant. To see this, let’s lay all of our cards “on 
the table,” pun intended. 


x 1 x = a [0] 2 [1] 
0 0 0 1 0 0 0 1 
1 1 1 0 1 0 1 1 


So exactly two of our unary ops are constant and the other two are balanced = one- 
to-one = not constant. 


The reason we complicate things by adding the vocabulary constant vs. balanced 
is that we will eventually move on to functions of more than one input, and in those 
Cases, 


e not all functions will be either balanced or one-to-one (e.g., the binary AND 
function isn’t either), and 


e balanced functions will not be one-to-one (e.g., binary XOR function is balanced 
but not one-to-one) 


Deutsch’s Problem 


We are now ready to state Deutsch’s problem using vocabulary that will help when 
we go to higher-input functions. 


361 


Deutsch’s Problem. Given an unknown unary function that we are told is either 
balanced or constant, determine which it is in one query of the quantum oracle, 
Ug 

Notice that we are not asking to determine the exact function, just which category 
it belongs to. Even so, we cannot do it classically in a single query. 


12.5.2 Deutsch’s Algorithm 


The algorithm consists of building a circuit and measuring the A-register — once. 
That’s it. Our conclusion about f is determined by the result of the measurement: 
if we get a “OQ” the function is constant, if we get “1” the function is balanced. We 
will have gotten an answer about f with only one query of the oracle and thereby 
obtained a x2 improvement over a classical algorithm. 


The Circuit 


We combine the quantum oracle for f with a few Hadamard gates in a very small 
circuit: 


\0) —H H A 
Us 
|1) H (ignore) 


Because there are only four unary functions, the temptation is to simply plug each 
one into Uy and confirm our claim. That’s not a bad exercise (which I’ll ask you to 
do), but let’s understand how one arrives at this design so we can use the ideas in 
other algorithms. 


The Main Ideas Behind Deutsch’s Solution 


Classical computing is embedded in quantum computing when we restrict our at- 
tention to the finite number of CBS kets that swim around in the infinite quantum 
ocean of the full state space. For the simplest Hilbert space imaginable — the first- 
order space, H = Hi) — those CBS kets consist of the two natural basis vectors 
{ |0) , |1) }, corresponding to the classical bits [0] and [1]. We should expect that no 
improvements to classical computing can be achieved by using only z-basis states (1.e., 
the CBS) for any algorithm or circuit. Doing so would be using our Hilbert space 
as though it were the finite set {|0), |1) }, a state of affairs that does nothing but 
simulate the classical world. 


There are two quantum techniques that motivate the algorithm. 


##1: Quantum Parallelism. This is the most general of the two ideas and will 
be used in all our algorithms. Any non-trivial superposition of the two CBS kets 


a|0) + 6|1), botha and 640, 


362 


takes us off this classical plane into quantum hyperspace where all the fun happens. 
When we send such a non-trivial superposition through the quantum oracle, we are 
implicitly processing both z-basis kets — and therefore both classical states, [0] and 
[1] — simultaneously. This is the first big idea that fuels quantum computing and 
explains how it achieves its speed improvements. (The second big idea is quantum 
entanglement, but we’ll feature that one a little later.) 

The practical impact of this technique in Deutsch’s algorithm is that we’ll be 
sending a perfectly balanced (or maximally mixed) superposition, 
2 = BM + aD, 

2 V2 

through the data register (the A-channel) of the oracle, U,. 

#42: The Phase Kick-Back Trick. This isn’t quite as generally applicable as 
quantum parallelism, but it plays a role in several algorithms including some we'll 
meet later in the course. It goes like this. If we feed the other maximally mixed state, 


DD, = =) - =H 
x J/2 J/2 y) 
into the target register (the B-channel) of Uy, we can transfer — or kick-back — 100% 
of the information about the unknown function f(x) from the B-register output to 
the A-register output. 


[0) 


You've actually experienced this idea earlier today when you studied quantum 
teleportation. Recall that by merely rearranging the initial configuration of our input 
state we were able to effect a seemingly magical transfer of |v) from one channel 
to the other. In the current context, presenting the «-basis ket, |1),, to the target 
register will have a similar effect. 


Temporary Change in Notation 


Because we are going to make heavy use of the x-basis kets here and the variable x is 
being used as the Boolean input to the function f(x), I am going to call into action 
our alternate x-basis notation, 


+) = |0) and 
I-) = [.- 


Preparing the Oracle’s Input: The Two Left Hadamard Gates 


Together, the two techniques explain the first part of Deutsch’s circuit (in the dashed- 
box), 


|0) H T= 158 


|1) H (ignore) 


363 


We recognize H as the operator that takes z-basis kets to x-basis kets, thus manu- 
facturing a |+) (i.e., |0),,) for the data register input and |—) (i.e., |1),,) for the target 
register input, 


Ok 


me lee ba) 


In other words, the Hadamard gate converts the two natural basis kets (easy states to 
prepare) into superposition inputs for quantum oracle. The top gate sets up quantum 
parallelism for the circuit, and the bottom one sets up the phase kick-back. For 
reference, algebraically these two gates perform 
0 1 
H|0) = Mae, = |+) and 
V2 
win . Wa _ 
eS Sage Se 
V2 


Analyzing the Oracle 


The real understanding of how the algorithm works comes by analyzing the kernel of 
the circuit, the oracle (in the dashed-box), 


[0) H H A 
Us 
|1) H (ignore) 


Step 1. CBS Into Both Channels. We creep up slowly on our result by first 
considering a CBS ket into both registers, a result we know immediately by definition 
of U fo 


Data register: |x) |x) 
oF ; 
Target register: |y) ly ® f(x)) 
or algebraically, 
Us (|x) ly) = |x) ly @ f(a)) 


Step 2. CBS Into Data and Superposition into Target. We stick with a 
CBS |x) going into the data register, but now allow the superposition |—) to go into 


364 


the target register. Extend the above linearly, 


ui (MP) - Us( la) 1) ~ Ue( er) 


V2 v2 
Iz) 10S f(z) — Iz) 1S fe) 


Us (lz) |-)) 


This amounts to 


1 
ee when f(x) = 0 
ae) =) when f(x) = 1 


= mye (So). 


Since it’s a scalar, (—1)/@) can be moved to the left and be attached to the A-register’s 
|x), a mere rearrangement of the terms, 


Url) = (14a) (MEL) = (ayy) Hy, 


and we have successfully (like magic) moved all of the information about f(x) from 
the B-register to the A-register, where it appears as an overall phase factor in the 
scalar’s exponent, (—1)/@). 


T 
= 


Us (2) |-)) 


The Oracle’s part of the circuit would process this intermediate step’s data as 
follows. 


a) GD 2 
U; 
i) =o 


Although we have a ways to go, let’s pause to summarize what we have accom- 
plished so far. 


e Quantum Mechanics: This is a non-essential, theoretical observation to test 
your memory of our quantum mechanics lesson. We have proven that |x) |—) is 
an eigenvector of U; with eigenvalue (—1)/ for x = 0,1. 


e Quantum Computing: The information about f(x) is encoded — “kicked- 
back” — in the A (data) register’s output. That’s where we plan to look for it 


365 


in the coming step. Viewed this way, the B-register retains no useful informa- 
tion; just like in teleportation, a rearrangement of the data sometimes creates a 
perceptual shift of information from one channel to another that we can exploit 
by measuring along a different basis — something we will do in a moment. 


Step 3. Superpositions into Both Registers. Finally, we want the state |+) 
to go into the data register so we can process both f(0) and f(1) in a single pass. 
The effect is to present the separable |+) ® |—) to the oracle and see what comes out. 
Applying linearity to the last result we get 


u(y) = 4 (PSE | 


U;(|0) |-) ) 


(SDP 0h se tal) =) 


1, EDO CAO) a 

= 5 
By combining the phase kick-back with quantum parallelism, we’ve managed to get an 
expression containing both f(0) and f(1) in the A-register. We now ask the question 
that Deutsch posed in the context of this simple expression, “What is the difference 
between the balanced case (f(0) # f(1)) and the constant case (f(0) = f(1))?” 
Answer: When constant, the two terms in the numerator have the same sign and 
when balanced, they have different signs, to wit, 


}0) + |1) 
J2 
}0) = |) 
V2 


We don’t care about a possible overall phase factor or (—1) in front of all this since 
it’s a unit scalar in a state space. Dumping it and noticing that the A-register has 
x-basis kets in both cases, we get the ultimate simplification, 


erie. sO) = 7) 
I-)I-) 5 if f(0) # FC) 


the perfect form for an x-basis measurement. Before we do that, let’s have a look at 
the oracle’s input and output states, 


|+) |+) or |—) 
Us 2 
|-) |-) 


366 


(+1) ale ay Oey) 
Orly) = 


(+1) ere SER O) A) 


Up cnelegy? = 


Measurement 


We only care about the A-register, since the B-register will always collapse to |—). 
The conclusion? 


After measuring the A-register along the x-basis, if we collapse to |+), f 
is constant, and if we collapse to |—), f is balanced. 


Of course an x-basis measurement is nothing more than a z-basis measurement 
after applying the x © z basis transforming unitary H. This explains the insertion 
of the final Hadamard gate in the upper right (dashed, 


|0) H H A 
Uy 
|1) H (ignore) 


Deutsch’s Algorithm in Summary 


We've explained the purpose of all the components in the circuit and how each plays 
a role in leveraging quantum parallelism and phase kick-back. The result is extremely 
easy to state. We run the circuit 


|0) —H H A 
Us 
|1) i (ignore) 


one time only and measure the data register output in the natural basis. 


e If we read “0,” f is constant. 


e If we read “1,” f is balanced. 
This may not seem like game changing result; a quantum speed up of 2x in a prob- 
lem that is both trivial and without any real world application, but it demonstrates 


that there is a difference between quantum computing and classical computing. It 
also lays the groundwork for the more advanced algorithms to come. 


12.6 mn Qubits and More Algorithms 


We’re just getting started, though. Next time we attack a general n-qubit computer 
and two algorithms that work on that kind of system, so get ready for more fun. 


367 


Chapter 13 


Multi-Qubit Systems and 
Algorithms 


13.1 Moving Up from 2 Qubits to n Qubits 


This week we generalize our work with one and two-qubit computation to include 
n-qubits for any integer n > 2. We'll start by defining nth order tensor products, a 
natural extension of what we did for 2nd and 3rd order products and then apply that 
to nth order state spaces, Hn). 


It is common for non-math major undergraduates to be confounded by higher 
order tensors due to the vast number of coordinates and large dimensions, so I will 
give you a running start by doing a short recap of both the second and third order 
tensor products first. This will establish a pattern that should make the larger orders 
go down more smoothly. 


13.2. General Tensor Products 


13.2.1 Recap of Order-2 Tensor Products 
Objects of the Product Space and Induced Basis 


We learned that the tensor product of two vector spaces, A and B, having dimensions 
da and dp, respectively, is the product space, 


W = AB, 


368 


whose vectors (a.k.a. tensors) consist of objects, w, expressible as weighted sums of 
the separable basis, i.e., 


where the c,; are the scalar weights and also serve as the coordinates of w along the 
states basis. 


The separable basis tensors appearing in the above linear combination are the 
dadp vectors 


{ a @b, | k=0,...,(d4—1) and j=0,...,(dp-1) }, 


induced by the two component bases, 


A= { ax rae and 
B= {bj }52.". 


The sums, products and equivalence of tensor expressions were defined by the required 
distributive and commutative properties, but can often be taken as the natural rules 
one would expect. 


Separable Operators in the Product Space 


A separable operator on the product space is one that arises from two component 
operators, 7T'4 and Tp, each defined on its respective component space, A and B. This 
separable tensor operator is defined first by its action on separable order-2 tensors 


[T4®Tp]l(a@b) = Ta(a) ® Tp(b) 


and since the basis tensors are of this form, it establishes the action of T’4 ® Tg on 
the basis which, in turn, extends the action to the whole space. 


13.2.2 Recap of Order-3 Tensor Products 
We also outlined the same process for a third-order tensor product space 
W = A®BEC 


in order to acquire the vocabulary needed to present a few of the early quantum 
algorithms involving three channels. Here is a summary of that section. 


369 


Objects of the Produce Space and Induced Basis 


Assuming A, B and C have dimensions dy, dg and dc, respectively, the basis for the 
product space is the set of d4dgdc separable tensors 


{ aK &® b,; &® ci} ’ 
where 


a, is the k&® vector in the basis for A, 


b,; is the j™ vector in the basis for Band 


jth 


c; is the /™ vector in the basis for C’. 


A general tensor w in the product space is expressible as a weighted sum of these 
basis tensors, 


‘WwW = Ss” Cryl (ay & b,; & C1) ; 


where the c,,;; are the scalar weights (or coordinates) that define w. 


Separable Operators in the Product Space 


A separable operator on the product space is one that arises from three component 
operators, T'4, Tz and Tc, each defined on its respective component space, A, B and 
C’. This separable tensor operator is defined first by its action on separable order-3 
tensors 


[T1® Tp @Tcl|(a®b@c) = Tala) ® Ta(b) ® Te(c) 


and since the basis tensors are of this form, that establishes the action of T,4 ®Tp@T¢ 
on the basis which, in turn, extends the action to the whole space. 


13.2.3. Higher Order Tensor Products 


We now formally generalize these concepts to any order tensor product space 


W = Ago @ Ay ® -+: @ An-2 ® An-1, 


370 


for n >= 2. We’ll label the dimensions of the component spaces by 


dim (Ao) = do ’ 

dim (A1) = dy 5 
dim(An-2) = dnp and 
dim (An-1) = dn—1 7 


The tensor product space will have dimension 
n-1 
dim(W) = dodi-++dn-2dn1 = [|] ds 
k=0 


which seems really big (and is big in fields like general relativity), but for us each 
component space is H which has dimension two, so dim(W) will be the — still large 
but at least palatable — number 2”. 


Objects of the Product Space and Induced Basis 


The vectors — a.k.a. tensors — of the space consist of those w expressible as weighted 
sums of the separable basis 
dg—1, di—1, ..., dn—1—1 
Ako B Atk, BW Ark, W +++ @ An—-1)kn_ \ . 
{ ° ; ; ae ee er eet her 


The somewhat daunting subscript notation says that 


Aox, is the (ko) vector in the basis for Ag, 
Aiz, is the (ky) vector in the basis for A, , 
ee 


Az, is the (k2)”’ vector in the basis for Ao, 


A(n—1)kn_1 1S the eras vector in the basis for An_1, 


If we write this algebraically, the typical w in W has a unique expansion along the 
tensor basis weighted by the scalars Cyoky..kn_4; 


w = ) Ckoky..kn—1 (ak, © Aik, & Age, W ++: W A(n—1)kn-1) . 


This notation is an order of magnitude more general than we need, but it is good 
to have down for reference. We'll see that the expression takes on a much more 
manageable form when we get into the state spaces of quantum computing. 


The sums, products and equivalence of tensor expressions have definitions analo- 
gous to their lower-order prototypes. You’ll see examples as we go. 


371 


Separable Operators in the Product Space 


A separable operator on the product space is one that arises from n component op- 
erators, Tp, JT, ..., Tn—1, each defined on its respective component space, Ao, Aj, 

.., An_1. This separable tensor operator is defined first by its action on separable 
order-n tensors 


[Zp ® T1 ® +++ ®@ Th_1] (Vo ® Vi ® +++ @ Vn-1) 
= To(Vo) & T\(v1) &®---& Ph Vyea) ; 


and since the basis tensors are always separable, 
Ak) @ Ark, @ +++ @ A(n-1)ky_1 » 
this establishes the action of Tp ® T; ®--- ® T),_; on the basis, 


[To @T, @ ++» @T,_-1] (Ako ® aig, @*** @ A(n—1)kn-1) 
= To(aor) ® Tiare) +--+ ® Th-1(ain-1)ken1) » 


which, in turn, extends the action to the whole space. 


Notation 


Sometimes we use the [| or ® notation to shorten expressions. In these forms, the 
product space would be written in one of the two equivalent ways 


n-1 n-l1 
W = @(A& = [4 . 
k=0 k=0 


the induced basis in one of 


ee Fee eee eer Pet eee oe nas Pe kat oes Peace | 
&) Ajk; = [ax ! ’ 
j=0 j=0 


ko, ki,..,kn-1 = 0,0,...,0 ko, ki,..,kn-1 = 0,0,...,0 


a separable operator as one of 


n-1 n-1 
@Qnm = [nr . 
k=0 k=0 


and a separable operator’s action on a separable state as either 


n—-1 n-1 n—-1 
k=0 k=0 k=0 


or 


n-1 
IIs 
k=0 


372 


13.3 n-Qubit Systems 


The next step in this lecture is to define the precise state space we need for a quantum 
computer that supports n qubits. I won’t back up all the way to two qubits as I did for 
the tensor product, but a short recap of three qubits will be a boon to understanding 
n qubits. 


13.3.1 Recap of Three Qubits 


A three qubit system is modeled by a third order tensor product of three identical 
copies of our friendly spin-1/2 Hilbert space, H. We can use either the order-notation 
or component space label notation to signify the product space, 


Hay = Ha@Hp@He. 


The dimension is 2x 2x2=8. 


Three Qubit CBS and Coordinate Convention 
The natural three qubit tensor basis is constructed by forming all possible separable 


products from the component space basis vectors, and we continue to use our CBS 
ket notation. The CBS for H(3) is therefore 


{ 10) ® |0) ® [0) , 0) @[0) @|1) , 10) @]1) @|0), 10) @|p@l)), 
1) @ [0) @ |0) , [1) @]0) @|1) ,  [1) @|1) @[0) , [1 @ [1) @|1) }, 


with the often used shorthand options 


0) @|0) @|0) <> |0)|0)Jo) <-+ |000) «> 0) 
0) @ 0) @|1) «> Jo)joy1) <> j001) «4 |1)% 
0) @|1) @|0) «> Jo)]1)Jo) «+ jo10) «<4 |2)% 
0) @|1) @|1) 4 fo)f1)1) <> Jol1) <4 |3)% 
1) @|0)@|0) <> 1) |0)|0) «4 100) <4 |4)° 
1)@|0)@]1) > |1)JO)]1) eo 101) <4 [5)° 
1)@]1)@|0) <> |1)|1)|0) <4 J110) <4 |6)° 
1)@|1)@ll) eo fit) eo fitay <6 {7° 


373 


The notation of the first two columns admits the possibility of labeling each of the 
component kets with the H from which it came, A, B or C, 


0) 4 @ 10), @ (Ng + [0)410)p [Oe 
10) 42 10)p @lYo > [0410p DYo, 
etc. 


The densest of the notations expresses the CBS ket as an integer from 0 to 7. This 
can all be summarized by looking at the third order coordinate representation of these 
eight tensors: 


jooo) | joo1y | joo) | jo11y | joo) | |1o1) | Jitoy | 4112) 
Jo)" |1)° |2)° |3)° | [5)° |6)° 7)" 
1 0 0 


we 


4)? 
0 
0 
0 
0 
1 
0 
0 
0 


SS? 1S: Se SSE 8S. SSE SS 
Os (Os (Or Os Sa OS SS 
oe oe ee oe oe a) 
oS oS 2° 2 — CG 2 © 
coor Oo Oo Oo Oo OG 
ee ee ee a a) 
Fe ooo co CO CO OG 


A typical three qubit value is a normalized superposition of the eight CBS, e.g., 
3)" +15)" 2)" +e 3)" +417)" 
V2 V3 
V1|0)? + V—.6|2)? — 7.054)? — (V.05 + iVv.2) |6)° , 


or most generally, 


7 
yoeelk) 
k=0 

where 


7 
a ae 
k=0 


13.3.2 Three Qubit Logic Gates; the Toffoli Gate 


Order three quantum logic gates are unitary operators on Hg). They can be con- 
structed by taking, for example, 


374 


e the separable product of three first order operators, 
e the separable product of a second order and a first order operator, or 


e any operator defined by a possibly non-separable unitary matrix. 


In this course, we won’t be studying third order gates other than the ones that are 
separable products of first order gates (like H®? discussed in the next section). How- 
ever, let’s meet one — the Toffoli gate — which plays a role in reversible computation 
and some of our work in the later courses CS 83B and CS 838C. 


. : The Symbol 


The gate is usually drawn without an enclosing box as in 


ae | 


but can be boxed to emphasize its overall effect on a tripartite state, 


D 
I, 


The A and B registers are the control bits, and the C' register the target bit, 


“control bits” { 


“target bit”? — 


dD 
WY 


two terms that will become clear in the next bullet point. 


At times Ill use all caps, as in TOFFOLI, to name the gate in order to give it the 
status of its simpler cousin, CNOT. 


|z) |y) : Action on the CBS 


The TOFFOLI gate has the following effect on the computational basis states: 


375 


In terms of the eight CBS tensors, it leaves the A and B registers unchanged and 
negates the C’ register qubit or leaves it alone based on whether the AND of the 
control bits is “1” or “0”: 


ifeAy=0 
roe Iz) 5 pean 
[=e “ihe Ag 


It is a controlled-NOT operator, but the control consists of two bits rather than one. 


Remember, not every CBS definition we can drum up will result in a unitary 
operator, especially when we start defining the output kets in terms of arbitrary 
classical operations. In an exercise during your two qubit lesson you met a bipartite 
“vate” which seemed simple enough but turned out not to be unitary. So we must 
confirm this property in the next bullet. 


foc.) : The Matrix 


We compute the column vectors of the matrix by applying TOFFOLI to the CBS 
tensors to get 


Mrorrou = [rorron }000) , TOFFOLI |001), ---, TOFFOLI 13) 


(0). }001) , |010) , O11), |100) , |101), |111) , 110 ) 


I 
oooooco.o Fe 
ooo occec or OG 
ooooocorc @ 
qoeoeoordcwcd#qcoeo 
ooorcoo @ 
Coorcooco Oo 
FoococCcnoo & 
CoOrococoococ & 


which is an identity matrix until we reach the last two rows (columns) where it swaps 
those rows (columns). It is unitary. 


[Exercise. Prove that this is not a separable operator. ] 


376 


|b)? : Behavior on General State 


Applying TOFFOLI to the general state, 


TGs" 0h, S06. Ot Oe. 208 30 Co Co 
Oe) FiO se sO Sh 0 C1 Cy 
OO Lf. O10 0-00 Ce Ce 
O00) le Oe OOF 0 C3 a C3 
OO 00.00" 002 Oro C4 7 C4 
O10 200.) at. OO C5 C5 
OO Qe Os oO: OF C6 C7 
Oem © Sis 0 ites 0 ia @ I 0 a re 0 C7 C6 


This is as far as we need to go on the Toffoli gate. Our interest here is in higher order 
gates that are separable products of unary gates. 


13.3.3 n Qubits 
Definition and Notation 


Definition of n Qubits. n qubits are, collectively, the tensor product 
of n identical copies of the 2-D Hilbert space H, and the value of the n 
qubits at any given moment is a particular tensor in this space having unit 
length. 


Note. As usual, we consider two unit-tensors which differ by a phase factor, e” 


for real 6, to be the same n qubit value. 
We can designate this state space using the notation 


n 


————— 
Hn) = HEH®...@H = &)H. 


n Qubit Computational Basis States 


The natural n qubit tensor basis is constructed by forming all possible separable 
products from the component space basis vectors. The CBS for H(,) is therefore 


{ 10) @ ---@ 0) @ [0) @ [0), (0) @ --- @ |0) @ |0) @|1), |0) @--- @ |0) @|1) @ [0), 
|0) @--- @|0) @ 1) @|1), |0)@---@]1)@|0) @ IO), ... , 
» |1)@---@|1) @[1) @ |0), 11) @---@|1) @ [1) @I1) }, 


377 


with the shorthand options 


0) @---@ 0) @]0) c+ |O)---J0)j0) <-s Jo---000) «+ |o0)” 
0) @---@|0)@]1) > JO)---JO|1) <> O---001) > [1)” 
0) @---@|1)@]0) <> JO)---|1jo) <> |O---010) <> 2)” 
Oey eh SS oe i). Oeac0it as 18" 
[T@- et) eo). 23. feo Sah eee 4} joe 9)" 
were ye eS ee A ee Fee): ee or ae 


Dimension of Hn) 


As you can tell by counting, there are 2” basis tensors in the product space, which 
makes sense because the dimension of the product space is the product of the dimen- 
sions of the component spaces; since dim(H) = 2, dim (H(n)) =2 x 2% ere 2 = 2", 
Vv. 


For nth order CBS kets we usually label each component ket using the letter x 
with its corresponding space label, 


|[Tn—-1) ® |Ln—2) ® |Fn-3) ® +++ @ |Zo) , aoe Ae 
with the more common and denser alternatives 
gay Bacay oe EP egy. SS Oy a yea ogg ay Wo) 
and densest of them all, 


lay; PRETO ds Derby gs bh as TA 


For example, for n = 5, 


}0)’ <—»  |00000) , 
J)? «+ 400001) , 
2)” <—»  |00010) , 
|8)° <—»  |01000) , 
\23)° <-> 10111), 
and, in general, 
Jz)? + |axyx3x20120) . 


378 


13.3.4 mn Qubit Logic Gates 


Quantum logic gates of order n > 3 are nothing more than unitary operators of order 
n > 3, which we defined above. There’s no need to say anything further about a 
general nth order logic gate. Instead, let’s get right down to the business of describing 
the specific example that will pervade the remainder of the course. 


The nth Order Hadamard Gate, H®” 


We generalize the two-qubit Hadamard gate, H®?, to n-qubits naturally. It is the local 
operator that behaves like n individual unary H gates if presented with a separable 
input state, 


H 
H 
n copies , 
el 
or described in terms of the CBS, 
= L— H |atp_1) 
— IL— H |x, 9) 
In-1Un_-2 *** L120) — — F7@n | 
a t— H |21) 
-% | — H |x) 


Algebraically, we developed the formula for the second order Hadamard, 


3 
Saal) a 


y=0 


H® |7\? = 


ND] Fr 


where “©” stands for the mod-2 dot product. It is only a matter of expending more 
time and graphite to prove that this turns into the higher order version, 


H®" |g)" = (=) Seana 


y=0 


Putting this result into the circuit diagram gives us 


ay” re Ee (a) (Hy Ip)" 


379 


Vector Notation 


We'll sometimes present the formula using vector dot products. If 2 and y are con- 
sidered to be vectors of 1s and 0s, we would represent them using boldface x and 


y; 


Ln-1 Yn-1 
XLn-2 Yn—-2 
Ca KX = ) Ym y= : 
vy WY 
0 Yo 


When so expressed, the dot product between vector x and vector y is considered the 
mod-2 dot product, 


XY = Fp1Yn-1 D Gn-2Yn-2 Ds: O MY D LoYyo.- 
This results in an equivalent form of the Hadamard gate using vector notation, 
L\" 22 
Hei = () oar yy. 
V2 2 


Pll remind you about this when the time comes. 


Higher Order Basis Conversion 


We'll be doing higher level basis conversions frequently, especially between the z-basis 
and the x-basis. Let’s review and extend our knowledge. 


The induced z-basis for H(n) is 
JO}:|0)**=|0)(0)5 10) ]Oy*"=|O) 1); (0) 0-9 (1) 10). aa... JL) |i pees |D a, 
while the induced x-basis for Hi) is 
|0).. [0),.°*- |0)., 19)... 5 tS) ean) ee |0) np ee |0)  |0)..°°° {1}, 10), » 
Using our alternate order-one x-basis notation, 


|+) = |0) and 
I-) = |), 


the induced x-basis CBS can also be written without using the letter “x” as a label, 


Pry tae) hey ae ey ey eel ag. lop ae) el) 
a ef a la ia i 


380 


Since H converts to and from these the x and z bases in H, it easy to confirm that 
the separable H®” converts to-and-from these two basis in Hn). 


[Exercise. Do it.] 


Notation. In order to make the higher order z-CBS kets of H(,) less confusing 
(we need “x” as an encoded integer specifying the CBS state), ’'m going to call to 
duty some non-standard notation that I introduced in our two qubit lecture: Ill use 
the subscript “+” to indicate a CBS relative to the x-basis: 


yi = n'*_order encoded input, y, relative to the x-basis. 


That is, if y is an integer from 0 to 2” — 1, when you see the + subscript on the CBS 
ket you know that its binary representation is telling us which zx-basis (not z-basis) 
ket it represents. So, 


ee ae 


HSHSH SHS 
| 


@Q 
t+ 
ce 


(Without the subscript “+,” of course, we mean the usual z-basis CBS.) This frees 
up the variable x for use inside the ket, 


jz) = n**_order encoded input, x, relative to the z-basis. 


With this notation, the change-of-basis is described simply as 


He" |x)" = |x) 


and in the other direction as 


He" |ayp = |x)". 


Natural Coordinates for the x-Basis 


In the last section we were looking at the separable form of the CBS for an nth order 
Hilbert space. Let’s count for a moment. Whether we have an x-basis, z-basis or any 
other basis induced from the n component Hs, there are n factors in the separable 
factorization, i.e., 


nm components nm components 


OF FT 
0) 1) [1)---10)]2) or |) |=) I=) +++ 4) I) 


But when expanded along any basis these states have 2” components (because the 
product space is 2” dimensional). From our linear algebra and tensor product lessons 


381 


we recall that a basis vector, b,, expanded along its own basis, 6, contains a single 1 
and the rest Os. In coordinate form that looks like 


0 


b; = 1 <— kth element. 


B 


This column vector is very tall in the current context, whether a z-basis ket, 


0 
ae | SSP rows 1 <— «th element, 
of], 
an x-basis CBS ket, 
0 
jx) = 2” rows 1 <— zth element, 
o/ |, 


or more to the point of this exploration, an x-basis ket expanded along the z-basis, 


2 


& 
~~ 
I 
bo 

3 
HW 
° 
re 
nN 

~ 


z 


Actually, we know what those ?s are because it is the H®” which turns the z-CBS 
into an x-CBS, 


Hen 
jy" [ay 


and we have already seen the result of H®" applied to any |x)”, namely, 


yk = He |2) = (=) 3 1" fy)", 


y=0 


382 


which, when written out looks something like 


joy” + iy” + j2)" + |3)" 4... + jar-t—1)" 
(v2)" | 


In coordinate form that’s 


& 
a 
I 
ie) 

3 
4H 
° 
= 
nN 


Zz 


But we can do better. Not all possible sums and differences will appear in the sum, 
so not all possible combinations of +1 and -1 will appear in an x-basis ket’s column 
vector (not counting the scalar factor (s5)"): An x-CBS ket, |x), will have exactly 
the same number of +s as —s in its expansion (and +1s, —1s in its coordinate vector) 
— except for |0),, which has all +s (+1s). How do we know this? 


We start by looking at the lowest dimension, # = H 1), where there were two 
easy-to-grasp x-kets in z-basis form, 


The claim is easily confirmed here with only two kets to check. Stepping up to second 
order, the x-kets expanded along the z-basis were found to be 


0)|O) + |O)]1) + |1) JO) + 1) |1) 


{oy 0) — joy|1) — |1)fo) + |1)|1) 
—)|-) = 5 


also easily seen to satisfy the claim. Let’s make a prediction. 
Little Lemma. For an order-n Hn), a typical x CBS ket look like 
joy? + fy? + jay” + [3)7 + 1... & [om-t-1)" 
(/2)" y) 


383 


where — except for |0)', which has all plus signs — the sum will always have an equal 


numbers of +s and —s. 


(Caution. This doesn’t mean that every sum with an equal number of positive 
and negative coefficients is necessarily an « CBS ket; there are still more ways to 
distribute the +s and —s equally than there are CBS kets, so the distribution of the 
plus and minus signs has to be even further restricted if the superposition above is to 
represent an x-basis ket. But just knowing that all « CBS tensors, when expanded 
along the z-basis, are “balanced” in this sense, will help us understand and predict 
quantum circuits. | 


Proof of Lemma. We already know that the lemma is true for first and second 
order state spaces because we are staring directly into the eyes of the two x-bases, 
above. But let’s see why the Hadamard operators tell the same story. The matrix for 
H®?, which is used to convert the second order z-basis to an 2-basis, is 


Le Sk 
Lp a Se “eS 
FOES BN, a a 
1 ae 


If we forget about the common factor s, it has a first column of +1s, and all its 
remaining columns have equal numbers of +1s and —1s. If we apply H®? to 0)? = 
(1, 0, 0, 0)’ we get the first column, all +1s. If we apply it to any |x)’, for x > 0, 
say, \2)? = (0,0, 1, 0)‘, we get one of the other columns, each one of which has an 


equal numbers of +1s and —1s. 


To reproduce this claim for any higher order Hadamard, we just show that the 
matrix for H®” (which generates the nth order x-basis) will also have all +1s in the 
left column and equal number of +1s and —1s in the other columns. This is done 
formally by recursion, but we can get the gist by noting how we extend the claim 
from n = 2 to n = 3. By definition, 


Eo = AOHSH = Hea), 


or, in terms of matrices, 


i. Ai. “ae - 
ty a ae eae Be 

@3 _ 
us (i 1 )e; 1 1-1-1 
1-1-1 1 


By our technique for calculating tensor product matrices, we know that the matrix 
on the right will appear four times in the 8 x 8 product matrix, with the lower right 
copy being negated (due to the -1 in the lower right of the smaller left matrix). To 


384 


wit, 


Boos Ge ot be <b bs 21 
to =k 2 = de ee | 
1 1-1 -l 1 1 ll 
Sat Cane | lL. =f, =] 1 


1 1 1 1 coo a el 
La. “ob eal = ff St 1 
LG a eh, Sl =f =I. iY 1 
| a | = Bo a 


Therefore, except for the first column (all +1s), the tensor product’s columns will all 
be a doubling (vertical stacking) of the columns of the balanced 4 x 4 (or a negated 
4 x 4). Stacking two balanced columns above one another produces a column that is 
twice as tall, but still balanced. QED 


[Exercise. Give a rigorous proof by showing how one extends an order (n — 1) 
Hadamard matrix to an order n Hadamard matrix.| 


13.3.5 Oracles for n Qubit Functions 


We saw that quantum oracles for functions of a single boolean variable helped produce 
some early quantum algorithms. Now that we are graduating to n qubit circuits useful 
in studing Boolean functions of n binary inputs, it’s time to upgrade our definition 
of quantum oracles to cover multi-input fs. (We retain the assumption that our 
functions produce only a single Boolean output value — they’re not vector functions). 


We are given a black box for an n-input Boolean function, f(@n—1, Un—2, --- ,T1, 20), 
Ln-1 
Ln—2 =| 
: ae f Bais Pe, FR TOs 
Ly - = = 
To 


We assume that circuit theory enables us to build an (n + 1)-in, (n + 1)-out oracle, 
Uy, defined on the 2”*' CBS kets 
{ lx)" |y) }, 


where x € {0, 1, 2, ..., 2” —1} is in encoded form, y € {0, 1}, and the circuit’s 


385 


action on these CBS is 


|x)" |x)" 
Uy 
ly) ly ® f(x)) 


e A consequence of the definition is that Uy is its own inverse (try it). 


e It is also easy to see that Uy emulates f(x) by setting y = 0, 
Us (jx)"|0)) = |x)" | F(x) . 


e We assume (it does not follow from the definition) that the oracle is of the 
same spatial circuit complexity as f(x), i.e., it grows in size at the same rate as 
f grows relative to the number of inputs, n. This is usually demonstrated to 
be true for common individual functions by manually presenting circuits that 
implement oracles for those functions. 


13.4 Significant Deterministic Speed-Up: The Deutsch- 
Jozsa Problem 


Deutsch’s algorithm enabled us to see how quantum computing could solve a problem 
faster than classical computing, but the speed up was limited to 2x, forget about 
the expense required to build the quantum circuit; it’s not enough to justify the 
investment. We now restate Deutsch’s problem for functions of n Boolean inputs and 
in that context call it the “Deutsch-Jozsa Problem.” We will find that the classical 
solution grows (in time) exponentially as n increases, not counting the increasing 
oracle size, while the quantum algorithm we will present next, has a constant time 
solution. This is a significant speed-up relative to the oracle and does give us reason 
to believe that quantum algorithms may be of great value. (The precise meaning of 
relative vs. absolute speed-up will be presented in our up-coming lesson devoted to 
quantum oracles, but we'll discuss a couple different ways to measure the speed-up 
informally later today.) 


We continue to study functions that have Boolean inputs and outputs, specifically 
n binary inputs and one binary output, 


f : {0, 1}" — {0,1}. 
The Deutsch-Jozsa Problem. Given an unknown function, 


Fie as Tn—-2y +++ ha) 


of n inputs that we are told is either balanced or constant, determine 
which it is in one query of the quantum oracle, U;. 


386 


13.4.1 Deutsch-Jozsa Algorithm 


The algorithm consists of building a circuit very similar to that in Deutsch’s circuit 
and measuring the data register once. Our conclusion about f is the same as in the 
unary case: if we get a “0” the function is constant, if we get “1” the function is 
balanced. We’ll analyze the speed-up after we prove this claim. 


The Circuit 


We replace the unary Hadamard gates of Deutsch’s circuit with nth order Hadamard 
gates to accommodate the wider data register lines, but otherwise, the circuit layout 
is organized the same: 


| = He He A 
Us 
|1) H (ignore) 


The circuit reveals that we will be 


e applying quantum parallelism by allowing the upper Hadamard to produce a 


perfectly mixed input state, |0)'!, into U;’s data register, and 


e using the phase kick-back trick by putting |—) into U;’s target register. 


Preparing the Oracle’s Input: The Two Left Hadamard Gates 


The first part of the Deutsch-Jozsa circuit (in the dashed box) prepares states that are 
needed for quantum parallelism and phase kick-back just as the lower-order Deutsch 
circuit did, 


Oye" fei A 
Us 
|1) HH (ignore) 


The H and H®” operators take z-basis kets to x-basis kets in the first order H, and 
the nth order H(,) spaces, respectively, thus manufacturing a |0)'; for the data register 
input and |—) for the target register input, 


0)" — He" |= [02 


[1) H = 


387 


The top gate sets up quantum parallelism and the bottom sets up the phase kick-back. 
For reference, here is the algebra: 


H®" |0)" 


rl 
Oo 
“——" 
+3 
| 
— 
Sle 
NS) 


n 2-1 l Qn_1 
—) >> Wy” = y= ly)" and 
‘= oP so 
0) Bese 2) 


AH\l) = a eae l= 


Analyzing the Oracle 


Next, we consider the effect of the oracle on these two x-basis inputs (dashed box), 


\0)? =a io A 
Us 
|1) ped (ignore) 


We'll do it in stages, as before, to avoid confusion and be sure we don’t make mistakes. 


Step 1. CBS Into Both Channels. When a natural CBS ket goes into both 
registers, the definition of Uy tells us what comes out: 


Data register: |x)” |x)” 
U; 
Target register:  |y) ly ® f(x)) 
algebraically, 
Us (|z)"|y)) = |)" ly @ f(x) . 


Step 2. CBS Into Data and Superposition into Target. Continuing on 
with a general CBS |) into the data register, we next allow the superposition |—) 
into the target register. 


on{ wy (MP) | U(x)" 10) ) — Up ( Ir)" 11) ) 


Uy (|2)" |-)) Ja 


|x)" lO f(x)) = |x)" [16 f(x) 


388 


This amounts to 


Us (|x)"|-)) 


1 
= 
3 


= yao (BEM) 


The scalar, (—1)/@ can be kicked back, 


Uy(ey"|-)) = (aya (MEP) = (ay ey") 1-9, 


and once again the information about f(x) is converted into an overall phase factor 
in the data register, (—1)/@ |x)". 


From a circuit standpoint, we have accomplished 


Data register: — |a)” (—1)£@) |x)” 
Uy 
Target register: |—) Ly 


Step 3. Superpositions into Both Registers. Finally, we send the full output 
of H®" |0}", 


OMS = ea aeh) 


into the data register so we can process f(a) for all x in a single pass and thereby 
leverage quantum parallelism. The net effect is to present the separable |0)", ® |—) to 
the oracle. Applying linearity to the last result we find 


U; (IR I-)) = u1 ( (Se a) yi ) 
(. St ; 
Fp Us( Ww)" I-) ) 
- E(w) 


389 


The Final Hadamard Gate 


(Warning. The version of the argument I give next is easy enough to follow and will 
“prove” the algorithm, but it may leave you with a “huh?” feeling. That’s because 
it does not explain how one arrives at the decision to apply the final Hadamard. Ill 
present a more illuminating alternative at the end of this lesson that will be more 
satisfying but which requires that you activate a few more little gray cells.] 


We are ready to apply the nth order Hadamard gate in the upper right (dashed 
box), 


jo)" ne Hep A 
Us 
|1) H (ignore) 


To that end, we consider how it changes the state at access point P into a state at 
the final access point Q: 


|0)” —4 He" He" A 


|1) H (ignore) 


Y Y 


P Q 


In this phase, we are subjecting the full (n + 1)st order separable 


(eS ee 1)/ | aie 


to the separable transformation H®" @ 1, which allows us to consider each component 
of the separable operation individually. We only care about the Hadamard part, since 


390 


it is the output of the data register we will test. It produces the output 


= = ; | 1 (Fe : ate) | 


I| 
nN 
S| 
bo 
3 
| 
ied 
i 
bo 
3 
| 
als 
ras 
| 
re 
tea 
pay 
= 
aos 
| 
ma 
ae 
ing 
iS 
nN—__— 
ye 
a cl 
3 


where we have regrouped the sum and defined a scalar function, G(z), of the sum- 
mation index z. So, the final output is an expansion along the z-basis, 


2n—1 

1 n 

data register at access point Q = ii y G(z) |z)". 
z=0 


We now look only at the coefficient, G(0), of the very first CBS ket, |0)”. This will 
tell us something about the other 2” — 1 CBS coefficients, G(z), for z > 0. We break 
it into two cases. 


e f is constant. In this case, f(y) is the same for all y, either 0 or 1; call it c. 


We evaluate the coefficient of |0)” in the expansion, namely CO us 
2-1 
G(0 1 2° 
= eye = Ey = 4, 
gn ra Qn 


thereby forcing the coefficients of all other z-basis kets in the expansion to be 0 
(why?). So in the constant case we have a CBS ket |0)” at access point Q with 
certainty and are therefore guaranteed to get a reading of “0” if we measure 
the state. 


e f is balanced. This time the coefficient of |0)” in the expansion is 


G(0) l Qn] 1 Qr—1 
m = ai (—-1)7 (—1)©° = = So (-1™, 
y=0 y=0 


but a balanced f promises an equal number f(y) = 0 and f(y) = 1, so the 
sum has an equal number of +1s and —1s, forcing it to be 0. Therefore, the 
probability of a measurement causing a collapse to the state |0)" is 0 (the 
amplitude-squared of the CBS state |0)"). We are guaranteed to never get a 
reading of “0” when we measure the data register at access point Q. 


391 


The Deutsch-Jozsa Algorithm in Summary 


We've explained the purpose of all the components in the circuit and how each plays 
a role in leveraging quantum parallelism and phase kick-back. The result is extremely 
easy to state. We run the circuit 


CO ero aoa s ae ee A 
Us 
|1) H (ignore) 


one time only and measure the data register output in the natural basis. 


e If we read “0” then f is constant. 


e If we read “x” for any other «, (i.e., x € [1, 2” — 1]), then f is balanced. 


13.4.2 Quantum vs. Classical Time Complexity 


What have we actually accomplished in efficiency? 


The Classical Time Complexity 


In order to know the answer to the Deutsch-Jozsa problem deterministically, i.e., with 
100% certainty, we would have to evaluate the function f for more than half of the 
possible inputs, i.e., at least 
ail a ee — a ae a | 

2 
times. That is, we’d plug just over half of the possible x values into f (say, 2 = 
0, 1, 2, ..., 2"7'+1), and if they were all the same, we’d know the function must 
be constant. If any two were distinct, we know it is balanced. Of course, we may get 
lucky and find that f(0) A f(1), in which case we can declare victory (balanced) very 
quickly, but we cannot count on that. We could be very unlucky and get the same 
output for the first 2”/2 computations, only to know the answer with certainty, on 
the 2"/2 + Ist. (if it’s the same as the others: constant, if not: balanced.) 


While we have not had our official lecture on time complexity, we can see that 
as the number of binary inputs, n, grows, the number of required evaluations of f, 
2”-' 4 1, grows exponentially with n. However, when we consider that there are 
N = 2” encoded integers that are allowed inputs to f, then as N grows, the number 
of evaluations of f, x + 1, grows only linearly with N. 


The classical problem has a solution which is exponential in n (the number of 
binary inputs, or linear in N = 2" (the number of integer inputs). 


392 


The Quantum Time Complexity 


We have solved the problem with one evaluation of Uy which is assumed to have the 
same the same circuit complexity as f. Now you might say that this is a constant 
time solution, i.e., it does not grow at all with n, because no matter how large n is, we 
only need to evaluate Uy once. In that light, the quantum solution is constant-time; 
it doesn’t grow at all with n. We simply measure the output of the data register, x, 
done using 


if (x > 0) 
and we’re done. 


You might argue that we have overstated the case, because in order to detect 
the output of the circuit, we have to query all n bits of the data registers to know 
whether we get “O00---00” or an integer other than that. No computer can do that 
for arbitrarily large n without having an increasingly large circuit or increasingly long 
testing algorithm. So in practical terms, this is an evaluation of U; followed by n 
one-bit queries, something that requires n if statements. That grows linearly with 
n or, using encoded integer counting (IV = 2”), logarithmically (even better). Either 
way, the quantum algorithm has a better time complexity (linear vs. exponential in 
n or logarithmic vs. linear in N) than its classical counterpart. So the speed-up is 
real. 


But there’s another way to view things that puts the quantum algorithm in an even 
more favorable light. Whether quantum or classical, the number of binary registers 
to test is the same: n. So we can really ignore that hardware growth when we speak 
of time complexity relative to the classical case; the quantum algorithm can be said 
to solve the problem in constant time relative to the classical algorithm. 


Reminder. I'll define terms like logarithmic, linear and exponential time com- 
plexity in the next lecture. 


The Deterministic Result 


Either way you look at it, if you require 100% certainty of the solution, we have found 
an algorithm that is “faster” than the classical solution. 


The Non-Deterministic (Probabilistic) Result 


If, however, we allow a small error possibility for the classical case, as is only fair 
since we might expect our quantum circuit to be prone to error (a topic of the next 
course), then the classical algorithm grows neither exponentially with n nor linearly 
with N, but in fact is a constant time algorithm, just like the Deutsch-Jozsa. I’ll give 
you an outline of the reason now, and after our probability lesson, we can make it 
rigorous. 


Classical Algorithm Admitting a Small Error Probability ¢ << 1. 


393 


Let’s consider the following classical algorithm. 


The M-and-Guess Algorithm. Let M be some positive integer (think 20). 
Given a Boolean function, f(x) of « € [0,—1] which is either balanced or constant, 
i.e., one that satisfies the Deutsch-Jozsa hypothesis, we evaluate f(x) M times, each 
time at a random xz € [0, n— 1]. We call each evaluation a “trial.” If we get two 
different outputs, f(x’) 4 f(x"), by the time we complete our M trials, we declare 
victory: f is balanced without a doubt. On the other hand, if we get the same output 
for all M trials, we declare near victory: We report that f is constant, with a pretty 
good certainty. 


How often will we get the wrong answer using this algorithm? 


The only way we can fail is if the function is balanced yet we declare it to be 
constant after M trials. That only happens if we are unlucky enough to get MW 
straight Os or M straight 1s from a balanced f. 


We'll call that eventuality, the event 
SIGE 


which is a symbolic way to say, “f was Balanced yet (technically AND) all trial 
outcomes were the .Yame.” 


Since a balanced f means there is a 50-50 chance of getting a 1 or a 0 on any trial, 
this unlucky outcome is akin to flipping a fair coin M times and getting either all 
heads or all tails. As you can intuit by imagining 20 heads or 20 tails in a sequence of 
20 fair coin tosses, this is quite unlikely. We’ll explain it rigorously in the upcoming 
lesson on probability, but the answer is that the probability of this event occurring, 
designated P(.Y \ B), is 


M 
1 1 1 
PUP Di SS “DQ = x [= = os 
a (3) «(@) = = 
The factor of 2 out front is due to the fact that the error on a balanced function 
can occur two different ways, all 1s or all Os. The final factor 1/2 is a result of an 
assumption — which could be adjusted if not true — that we are getting a constant 
function or balanced function with equal likelihood. 
So we decide beforehand the error probability we are willing to accept, say some 
€ << 1, and select M so that 
3M SS ES 
This will allow our classical algorithm to complete (with the same tiny error prob- 


ability, ¢, in a fixed number of evaluations, M, of the function f regardless of the 
number of inputs, n. To give you an idea, 


0.000001 , for M = 20 


P(SAB) < 
9x 10716, for M = 50 


394 


Since the error probability does not increase with increasing n, the classical algo- 
rithm has a constant time solution, meaning that we can solve it with the same 
time complexity as the quantum Deutsch-Jozso algorithm. (We will define terms 
like complexity and constant time precisely very soon, but you get the general idea.) 
Therefore, no realistic speed-up is gained using quantum computing if we accept a 
vanishingly small error result. 


This does not diminish the importance of the deterministic solution which does 
show a massive computational speed increase, but we must always temper our enthu- 
siasm with a dose of reality. 


13.4.3 Alternate Proof of the Deutsch-Jozsa Algorithm 


I’d like to offer a slighty more elaborate — but also more illustrative — argument for the 
final Hadamard gate and how we might guess that it is the correct way to complete 
the circuit (dashed box), 


|0)” H®" He A 
Us 
|1) H (ignore) 


Recall that we had established that at access point P, 


|0)” He He" A 
Us 
|1) H (ignore) 
Y 
P 
the data register was in the state 
(i, 2s 
— vy)”, 
20 y= 


The hypothesis of Deutsch-Jozsa tells us that f is either balanced or constant which 
has implications about this state. 


1. If f is constant, then all the coefficients, (—1)/®, are the same, and we are 
looking at |0)" (or possibly -1 times this state, observationally equivalent). 


2. If f is balanced, then half the (—1)/ are +1 and half are —1. Now, our little 
lemma reminds us that this condition suggests — but does not guarantee — that 
a balanced state might, at access point P, be an z-CBS state other than |0)".. 


395 


The constant case 1, guarantees that we land in the z-CBS state, |0)!!. The balanced 
case 2, suggests that we might end up in one of the other z-CBS states, |x)';, for 
x > 1. Let’s pretend that in the balanced case we are lucky enough to land exactly 
in one of those other CBS states. If so, when we measure at access point P along the 


x-basis, 


1. a measurement of “0” would imply that f was constant, and 


2. a measurement of “x,” for x > 0, would imply that f was balanced. 


This is because measuring any CBS state along its own basis gives, with 100% prob- 
ability, the value of that state; that state’s amplitude is 1 and all the rest of the CBS 
states’ amplitudes are 0. 


The Bad News. Alas, we are not able to assert that all balanced fs will produce 
x-CBS kets since there are more ways to distribute the + and — signs equally than 
there are x-CBS kets. 


The Good News. We do know something that will turn out to be pivotal: a 
balanced f will never have the CBS ket, |0)'; in its expansion. Let’s prove it. 


If we give the data register’s state at access point P the name |)”, 
ow 


|B" = Tan Is 


then we know that a balanced f means that 


y=0 


1 
=pik 
“ 1 +1 f 
ae Ta ; equal numbers of +1 and —1. 
sail 
sald 


n 


Furthermore, its |0)", coefficient is given by the dot-with-the-basis-ket trick (all coef- 
ficients are real, so we can use a simple dot-product), 


1 
: sail 
‘ ou ie 1 | #1 
4. (0| 8) NEY peel (EE A : 
af 2” van : 
1 

l see 
zl | 


Aside from the scalar factors 1/2, the left vector has all 1s, while the right vector 
has half +1s and half —1s, i.e., their dot product is 0: we are assured that there is 
no presence of the 0th CBS ket |0)'; in the expansion of a balanced f. V 


396 


We have shown that the amplitude of the data register’s |0)'; 


, is 0 whenever f is 
balanced, and we already knew that its amplitude is 1 whenever f is constant, so 
measuring at access point P along the x-basis will 


n 


e collapse to |0)"’ if f is constant, guaranteed, and 


ns 


e never collapse to |0)', if f is balanced, guaranteed. 


Conclusion: If we measure along the x-basis at access point P, a reading of “0” 
means constant and a reading of “x,” for any x > 0, means balanced. 


Measurement 


The x-basis measurement we seek is nothing more than a z-basis measurement after 
applying the nth order x © z basis transforming unitary H®”. This explains the 
final nth order Hadamard gate in the upper right (dashed box), 


Jo)" Ao He" A 
Us 
|1) A (ignore) 


After that, when we measure the data register it will either be 


e |0)", which corresponds to the pre-H®” state of |0)';, and therefore indicated a 
constant f, or 


e anything else, which corresponds to a pre-H®” state that did not contain even 
“trace amounts” of |0)' before the final gate and therefore indicates a balanced 


. + 


The argument led to the same conclusion but forced us to think about the direct 
output of the oracle in terms of the x-basis, thereby guiding the decision to apply the 
final Hadamard gate. Not only that, we get a free algorithm out of it, and [’ll let you 
guess what it is. 


A New Problem and a New Algorithm: You Discover it. 


[Exercise. While not all balanced functions lead to x CBS kets at access point P, 
several do. Describe them in words or formulas.] 


Let’s call the collection of functions in the last exercise Z,. 
[Exercise. How many functions are in the set &,?| 


[Exercise. If you are told that an unknown function is in the set @,, formulate 
an algorithm using the Deutsch-Jozsa circuit that will, in a single evaluation of Uy, 
determine the entire truth table of the unknown function. ] 


397 


[Exercise. How many evaluations of the unknown function f would be needed 
to do this deterministically using a classical approach?| 


[Exercise. If you were to allow for a non-deterministic outcome classically, would 
you be able to get a constant time solution (one whose number of evaluations of f 
would be independent of n for a fixed error, €)?| 

(Exercise. After attempting this problem, read the next section (Bernstein- 
Vazirani) and compare your results and algorithm with that seemingly distinct prob- 
lem. Are the two truly different problems and algorithms or is there a relationship 
between them?] 


13.5 ‘True Non-Deterministic Speed-Up: The Bernstein- 
Vazirani Problem 


The quantum Deutsch-Jozsa algorithm offers a deterministic exponential speed-up 
over the classical algorithm, but realistically when we accept a small error, both it 
and its classical alternative are “constant time,” i.e, they can be solved in a fixed 
time independent of the number of inputs. We saw this when analyzing the classical 
algorithm. 


The first problem that shows a clear separation in time complexity is the Berstain- 
Vazirant problem, in which classically, even when one accepts an error the solution 
still grows in time with the number of inputs, n. Meanwhile, the quantum solution 
is constant time — only one evaluation of the oracle, Uy, is needed regardless of the 
number of inputs. 


Reminder. We'll cover probability and time complexity in formal lessons, but 
for our current purpose we can let intuition guide our computations, just as we did 
earlier with classical Deutsch-Jozsa analysis. 


We continue to study functions that have Boolean inputs and outputs, specifically 
n binary inputs and one binary output, 


f : {0, 1}" — {0,1}. 
The Bernstein-Vazirani Problem, Given an unknown function, 


Cas ie Se oak eG 


of n inputs that are known to be defined by a mod-2 dot product with an 
n (binary) digit constant, a, 


f(z) = az, 


find a in one query of the quantum oracle, U,. 


398 


13.5.1 The Bernstein-Vazirani Algorithm 


The algorithm uses the same circuit with the same inputs and the same single data 
register measurement as Deutsch-Josza. However this time, instead of asking whether 
we see a “O” or a non-“O” at the output, we look at the full output: its value will be 
our desired unknown, a. 


The Circuit 


For quick reference, here it is again: 


Aira fs He A 
Us 
|1) H (ignore) 


The |0) going into the top register provides the quantum parallelism and the |1) 
into the bottom offers a phase kick-back that transfers information about f from the 
target output to the data output. 


Preparing the Oracle’s Input: The Two Left Hadamard Gates 


Same as Deutsch-Jozsa. The first part of the circuit prepares states that are needed 
for quantum parallelism and phase kick-back, 


Sane eg A 
Us 
|1) H (ignore) 


The two registers going into the oracle are, again 


2”—1 


Ho)" = = So Ip)" and 
ayy = MOM _ Ly 


V2 


Analyzing the Oracle 


The oracle (dashed box) has the same output, 


|o)” He” He A 
Us ? 
|1) H (ignore) 


399 


namely, 


Us (10)E |-)) = ( — So (1 0) =e 


U;s(l0)21-)) = (Fs : (=e 0) ae 


The Final Hadamard Gate 


We apply the final nth order Hadamard gate. At access point P, 


0) See Hee A 
Us 
|1) H (ignore) 
Y y 
P Q 
the data register holds the ket 
LS ayer 
ee 


while at point Q, it ends up as (refer to the individual steps in the Deutsch-Josza 
derivation, with a © y in place of f(y)), 


en 1 = aoy n 
A (Fs 4 (-Y) u") 


which also defines a scalar function G(z), used to simplify the analysis. So, the final 
output is an expansion along the z-basis, 
ari 
data register at access pointQ = ar S- G(z) |z)”. 
z=0 


Consider Gz) in the two cases: 


400 


y=0 y=0 
so the amplitudes of the CBS ket |a) is 


G@) _ a 
rn) | al 


e z#a. We don’t even have to sweat the computation for the amplitudes for the 
other kets, because once we know that |a) has amplitude 1, the others have to 
be 0. (Why?) 


We have shown that at access point Q, the CBS state |a) is sitting in the data register. 
Since it is a CBS state, it won’t collapse to anything other than what it already is, 
and we are guaranteed to get a reading of “a,” our sought-after n-bit binary number. 


Time Complexity 


Because the quantum circuit evaluates U; only once, this is a constant time solution. 
What about the classical solution? 


Deterministic. Classically we would need a full n evaluations of f in order to 
get all n coordinates of a. That is, we would use the input value 


0 
eh = 1| < kth element. 
0 


in order to compute the Ath coordinate of a based on 


0 ao 
f (ex) = 1 Ak = dk 
0 Qn-1 


After n passes we would have all n coordinates of a and be done. Thus, the classical 
algorithm grows linearly with the number of inputs n. This kind of growth is called 
linear growth or linear time complexity as it requires longer to process more inputs, 
but if you double the number of inputs, it only requires twice as much time. This 
is not as bad as the exponential growth of the classical deterministic Deutsch-Jozsa 
algorithm. 


401 


Alternatively, we can measure the classical deterministic solution to the current 
problem in terms of the encoded integer size, N = 2”. In that case the classical 
algorithm is logarithmic in N, which doesn’t sound as bad as linear, even though this 
is just a different accounting system.) 


Non-Deterministic. What if, classically, we evaluate f a fixed number of times, 
M, and allow for some error, ¢ close to 0? Can we succeed if M is independent of 
the number of inputs, n? No. In fact, even if allowed M to grow with n by taking 
M =n-—1, we would still be forced to guess at the last coordinate. This would 
produce a 50% error since the last coordinate could be 1 or 0 with equal probability. 
We can’t even make a good guess (small error ¢ close to 0) if we skip a measely one of 
the n evaluations, never mind skipping the many evaluations that would be hoisted 
on us if we let M be constant and watched n grow far beyond M. 


So in practical terms, the classical solution is not constant time, and we have 
a clear separation between quantum and classical solutions to the question. This 
is a stronger result than quantum computing provided to the Duetsch-Josza prob- 
lem where, when we allowed a small error, there was no real difference between the 
quantum and classical solutions. 


13.6 Generalized Born Rule 


We can’t end this lesson without providing a final generalization of Traits #415 and 
#15’, the Born rule for bipartite and tripartite systems. We'll call it Traits #415”, 
the generalized Born rule. The sentiment is the same as its smaller order cousins. 


In rough language it says that when we have a special kind of sum of separable 
states from two high-dimensional spaces, A and B, an A-measurement will cause the 
overall state to collapses to one of the separable terms, thereby selecting the B-state 
of that term. 


13.6.1 Trait #15” (Generalized Born Rule for (n+m)th order 
States) 


Assume that we have an (n + m)th order state, |y)""”, in the product space A @ B 
= Hin) ® Him) with the property that |y)"*” can be written as the following kind 
of sum: 


ey" = (0) Ido) + IY WR toe + 1) froma) 


In this special form, notice that each term is a separable product of a distinct CBS 
ket from A and some general state from B, i.e., the kth term is 


[ey [e)'B 


402 


We know by QM Trait #7 (post-measurement collapse), that the state of the com- 
ponent space A = H(,) must collapse to one of the CBS states, call it 


Ko) - 


The generalized Born rule assures us that this will force the component space B = 
Hm) to collapse to the matching state, 


Pio) B ) 


only it will become normalized after the collapse to the equivalent 


ko)” 


(Who | Wko ) 


(Note that in the last expression I suppressed the superscripts m in the denominator 
and subscripts, B, everywhere, to avoid clutter.) 


Discussion 


The assumption of this rule is that the component spaces A and B are in an entangled 
state which can be expanded as a sum, all terms of which have A basis factors. Well, 
any state in A ® B can be expressed expressed this way; all we have to do is express 
it along the full 2"*™ product basis kets, then collect terms having like A-basis kets 
and factor out the common ket in each term. So the assumption isn’t so much about 
the state |y)"*” as it is about how the state is written. 


The next part reminds us that when an observer of the state space A takes a 
measurement along the natural basis, her only possible outcomes are one of the 2"—1 
basis kets: 


{04 [a> Pia, ---> 2-4}, 


so only one term in the original sum survives. That term tells us what a B-state 
space observer now has before him: 


ae ee cae 


Ce 2s By ale. 


A\, |2)" = g\ J. 


and, in general, 


et oe cone 


403 


for Be OF ie 2S 


This does not tell us what a B-state observer would measure, however, since the state 
he is left with, call it 


kod 
V (Who | Peo) 


is not assumed to be a CBS ket of the B space. It is potentially a superposition 
of CBS kets, itself, so has a wide range of possible collapse probabilities. However, 
just knowing the amplitudes of the state |y,,); (after normalization) narrows down 
what B is likely to find. This measurement collapse, forced by an observer of A, but 
experienced by an observer of the entangled B, is the crux of the remainder of the 
course. 


Size of the Superposition. We’ve listed the sum as potentially having the 
maximum number of 2” terms, based on the underlying assumption that each term 
has an A-basis ket. However, it often has fewer terms, in which case the rule still 
applies, only then there are fewer collapse possibilities. 


Role of A and B. There was nothing special about the component state space 
A. We could have expanded the original state, 


Iyyrr™ 


in such a way that each term in the sum had a B-space CBS ket, |k)') and the A-space 
partners were general states, |w,)'). 


13.7 Towards Advanced Quantum Algorithms 


Even though Bernstein — Vazirani provided a separation between quantum and 
classical computing, the classical problem was still easy, meaning it did not grow 
exponentially with n; the improvement is not yet dramatic enough to declare quantum 
computing a game changing technology. For that, we will need to study two landmark 
algorithms, 


e the quantum solution to Simon’s problem, and 


e the quantum solution to Shor’s problem. 


For those we need to add a little more math to our diet, so we take a small-but- 
interesting side trip in next time. 


404 


Chapter 14 


Probability Theory 


14.1 Probability in Quantum Computing 


14.1.1 Probability for Classical Algorithms 


In the few quantum algorithms we’ve had, the quantum circuit solved the problem in 
a single pass of the circuit: one query of the oracle. We didn’t need probability for 
that. We did use probability, informally, to estimate how long a classical algorithm 
would take to complete if we allowed for experimental error. In the Deutsch-Jozsa 
problem, when we considered a classical solution using the M-and-guess method, we 
expressed the event of failure (“all trial outcomes were the “ame yet (read “and”) 
f is Balanced” ) by 


oF NB, 


and argued intuitively that the probability of this happening was 


1 
QM” 


P(SABZ) = 


That analysis was necessary to evaluate the worthiness of the quantum alternative’s 
constant-time solution. 


14.1.2 Probability for Quantum Algorithms 


Soon, we will study quantum algorithms that require multiple queries of a quantum 
circuit and result in an overall performance which is non-deterministic, i.e., proba- 
bilistic. Here’s a preview of a small section of Simon’s quantum algorithm: 


. (previous steps) ... 


e Repeat the following loop at most n+ T times or until we get n — 1 linearly 
independent vectors, whichever comes first. 


405 


... (description of loop) ... 
e If the above loop ended after n + T full passes, we failed. 
e Otherwise, we succeeded. Add an nth vector, wn , which is ... 


... (following steps) ... 


Estimating the probabilities of this algorithm will require more than intuition; it will 
require a few probability laws and formulas. 


Today we'll cover those laws and formulas. We'll use them to solidify our earlier 
classical estimations, and we’ll have them at-the-ready for the upcoming probabilistic 
quantum algorithms. 


14.2 The Essential Vocabulary: Events vs. Prob- 
abilities 


When we build and analyze quantum circuits, we’ll be throwing around the terms 
event and probability. Events (word-like things) are more fundamental than probabil- 
ities (number-like things). Here is an informal description (rigor to follow). 


14.2.1 Events 


An event is something that happens, happened, will happen or might happen. 

Events are described using English, Mandarin, Russian or some other natural 
language. They are not numbers, but descriptions. There is no such thing as the 
event “8.” There is the event that Salim rolls an 8 at dice, Or Han missed the 8 PM 
train. 


e The Higgs particle might decay into four muons. 


e An application error occurred at 6:07 PM. 


SY Number of events: 66 


Level Date and Time Source Event ID 


@eEror 8/16/2014 6:07:00 PM Application Er... 1000 


e Ms. Park shot a 1-under-par yesterday at Pebble Beach. 


406 


Fes 


We will often use a script letter like .7%,Y,@,... to designate events, as in 


e “... Let & be the event that two sample measurements of our quantum circuit 
are equal ...”, 


4 


e “... Let ¥% be the event that all the vectors we select at random are linearly 
independent ...”, or 


e “... Let @ be the event that the number given to us is relatively prime to 100 


14.2.2 Probabilities 


Probabilities are the numeric likelihoods that certain events will occur. They are 
always positive numbers between 0 and 1, inclusive. If the probability of an event is 
0, it cannot occur, if it is 1, it will occur with 100% certainly, if it is .7319, it will 
occur 73.19% of the time, and so on. We express the probabilities of events using P() 
notation, like so 


P(Ms. Park will shoot a birdie on the 18th hole) =  .338, or 
P(at least one measurement is > -13.6eV) =  .021. 


In the heat of an analysis, we’ll be using the script letters that symbolize the events 
under consideration. They will be defined in context, so you'll always know what 
they stand for. The corresponding probabilities of the events will be expressed, then, 
using syntax like 


P(€) = 99996, 
P(Y) <  .1803, or 
P(@) > .99996. 


14.3. The Quantum Coin Flip 


Many probability lessons use the familiar coin flip as a source of elementary examples. 
But we quantum computer scientists can use a phenomenon which behaves exactly like 
a coin flip but is much more interesting: a measurement of the quantum superposition 
state 


0) + |1) 
Ti 


407 


If we prepare this state by applying a Hadamard gate, H, to the basis state |0), our 
“coin” is waiting at the output gate. In other words, the coin is H |0): 


0) + 1) 


|0) H a 


[Exercise. We recognize this state under an alias; it also goes by the name |+). 
Recall why.] 


Measuring the output state H |0) is our actual “toss.” It causes the state to collapse 
to either |0) or |1), which we would experience by seeing a “0” or “1” on our meter. 


}0) + |) 
J2 


(Here, \y means collapses to.) 


Al Sy |0) or |1) 


That’s equivalent to getting a heads or tails. Moreover, the probability of getting 
either one of these outcomes (the eigenvalues) is determined by the amplitudes of 
their respective CBS kets (the eigenvectors). Since both amplitudes are 1/V/2, the 
probabilities are 


P( iB) | 1 |’ : d 
measuring = —— = 5 an 
J2 2 
P( ing 1) | al : 
measuring = — = oo. 
V2 2 


So we have a perfectly good coin. Suddenly learning probability theory seems a lot 
more appealing, and as a bonus we’ll be doing a little quantum computing along the 
way. 

[Exercise. We could have used H |1) as our coin. Explain why.| 


[Exercise. The above presupposes that we will measure the output state along 
the z-basis, {|0), |1)}. What happens if, instead, we measure the same state along 
the x-basis, {|+) , |—)}. Can we use this method as a fair coin flip?] 


14.4 Experimental Outcomes and the Sample Space 
We now give the definitions of events and probabilities a bit more rigor. 


408 


14.4.1 Outcomes 


Every experiment or set of measurements associated with our quantum algorithms 
will consist of a set of all possible outcomes. An outcome is the most basic result we 
can imagine. 


A Single Qubit Coin Flip 


If the experiment is to prepare exactly one quantum coin, i.e., the state H |0), and 
then measure it, there are two possible outcomes: 


— “Meter reads 0” <—> “state collapsed to |0)” and 


— “Meter reads 1” <—> “state collapsed to |1).” 
Usually, though, we simply say that the possible outcomes are 0 and 1. 


A Ten Qubit Coin Flip 


A more involved experiment would be to prepare ten identical-but-distinct quantum 
coins, labeled #0 - #9 (because we are computer scientists), then measure each one, 
resulting in ten measurements. 


#0 0) H Ree Ze 

10 #1 \0) H Ree ak 
quantum 

#9 - \0) H aoe Pa 


Now, defining the outcomes isn’t so clear. 

Maybe we care about the number of 0s and 1s, but not which preparations (say, 
#1, #7, #9) measured 0 and which (say, #0, #2, #3, #4, #5, #6, #8) measured 1. The 
outcomes could be defined by 11 possible “head” counts, where head means getting 
a 0: 


— “Total 0s = 0” 
— “Total 0s = 1” 
— “Total 0s = 2” 


409 


— “Total 0s = 10” 


Again, we could abbreviate this by saying, “the possible outcomes are 0-10.” 


One problem with this definition of “outcome” is that some are more likely than 
others. It is usually beneficial to define outcomes so that they are all equally — 
or nearly equally — likely. So, we change our outcomes to be the many ten-tuples 
consisting of 


— “Results were (0,0,0,0,0,0,0,0,0,0)” 


“Results were (0,0,0,0,0,0,0,0,0,1)” 


“Results were (0,0,0,0,0,0,0,0,1,0)” 


“Results were (1,1,1,1,1,1,1,1,1,1)” 


There are now a lot more outcomes (2'° = 1024), but they each have the same 
likelihood of happening. (If you don’t believe me, list the eight outcomes for three 
coins and start flipping.) A shorter way to describe the second breakdown of outcomes 
is, 


“a possible outcome has the form 
(Xo, U1, 02, 13, V4, U5, VE, V7, Vg, Vg ” 


where x; 1s the kth measurement result.” 


14.4.2 Requirements when Defining Outcomes 


While there are many ways to partition all the possible results of your experiment, 
there is usually one obvious way that presents itself. In the ten qubit coin toss, we saw 
that the second alternative had some advantages. But there are actually requirements 
that make certain ways of dividing up the would-be outcomes illegal. To be legal, a 
partition of outcomes must satisfy two conditions. 


1. Outcomes must be mutually exclusive (a.k.a. disjoint). 


2. Outcomes must collectively represent every possible result of the experiment. 


(Exercise. Explain why the two ways we defined the outcomes of the ten qubit 
coin toss are both legal.| 


410 


14.4.3. An Incorrect Attempt at Defining Outcomes 


It’s natural to believe that organizing the ten qubit coin toss by looking at each 
individual qubit outcome (such as “the 4th qubit measured a 0”) as being a reasonable 
partition of the experiment. After all, there are ten individual circuits. Here is the 
(unsuccessful) attempt at using that as our outcome set. 


“The outcomes are to be the individual measurement results 


Z, = measurement of qubit *3 is 0, 
Z, = measurement of qubit *8 is 0, 
O; = measurement of qubit *5 is 1, 
Oyo = measurement of qubit *0 is 1, 


and so on.” 


(Z would mean that the event detects a Zero, while O, script-O, means the event 
detects a One. Meanwhile, the subscript indicates which of the ten measurements we 
are describing.) 


There are ten measurements, each one can be either zero or one, and the above 
organization produces 20 alleged outcomes. However, this does not satisfy the two 
requirements of “outcome.” 


[Exercise. Explain why?] 


14.4.4 Events 
Definitions 


Event. An event is a subset of outcomes. 


Simple Event. An event that contains exactly one outcome is called a 
simple event (a.k.a. elementary event). 


Please recognize that outcomes are not events. A set containing an outcome is an 
event (a simple one, to be precise). 


Compound Event. An event that contains more than one outcome is 
called a compound event. 


Describing Events 


Events can be described either by the actual sets, using set notation, or an English 
(or French or Vietnamese) sentence. 


Examples of simple event descriptions for our ten qubit coint toss experiment are 


A411 


gl (0 © Panel oe me Pa © Pap 0 pal bvee I 

eae She 0 

- { (0, 0, 0, 0, 0, 0, 0, 0, 0, 0) } 

- “The first five qubits measure 0 and the last five measure 1.” 
- “All ten qubits measure 1.” 


- “Qubit *0 measures 0, and as the qubit * increases, the measurements alter- 
nate.” 


Examples of compound event descriptions for our ten qubit coint toss experiment 
are 


SAN Oe alee els ale Oe 0, Ace zo Cke cea ck cba NOs Qed, (i) 
- “The first five qubits measure 0.” 
- “The fourth qubit measures 1.” 


- “As the qubit * increases, the measurements alternate.” 
[Exercise. Describe five simple events, and five compound events. Use some 


set notation and some natural English descriptions. You can use set notation that 
leverages formulas rather than listing all the members, individually. 


14.4.5 The Sample Space 


Sample Space. The sample space is the set of all possible outcomes. 
It is referred to using the Greek Q. 


In our ten qubit coin toss, 2 is the set of all ordered ten-tuples consisting of 0s and 
ls, 


a { (Gos Mig hansen 1 We) | xr € {0, uy}. 
At the other extreme, there is the null event. 


Null Event. The event consisting of no outcomes, a.k.a. the empty set, is the 
null event. It is represented by the symbol, 0. 


412 


14.4.6 Set Operations 
Unions 


One way to express (2 is as a compound event consisting of the set union of simple 
events. Let’s do that for the ten qubit coin flip using big-U notation, which is just 
like summation notation, 1, only for unions, 


QO = UJ il Log@iy Gays 0x09) fs 
xp€E{0, 1} 


Example. We would like to represent the event, ¥, that the first four quantum 
coin flips in our ten qubit experiment are all the same. One expression would be 


UJ {As OSA Bay Wisse a5 OS) ts 


w, x, € {0,1} 


‘9 
I 


Example. We would like to represent the event, .¥’, that the first four quantum 
coin flips in our ten qubit experiment are all the same, but the first five are not all 
the same. One expression would be 


F' = UJ {(w, w, w, w,w@l, x, 11, ...,24)}, 
w, tp € {0,1} 
where @ is addition mod-2. 


Exercise. Find a representation of the event, &, that the first and last quantum 
coin flip in our ten qubit experiment are different. ] 


Exercise. Find a representation of the event, @, that the number of Is in our 
quantum coin flip in our ten qubit experiment is odd.| 


Exercise. Find a representation of the event, #, that the Hamming weight, i.e., 
the sum of the outcomes, of our ten qubit quantum coin flip is > 6.] 


Intersections 


We can use intersections to describe sets, too. For example, if we wanted an event 
&@ in our ten qubit coin flip in which both a) the first four are the same and also b) 
the first and last are different, we might express that using 


BA = FE, 
where ¥ and & were defined by me (and you) a moment ago. 


Of course, we could also try representing that event directly, but once we have a 
few compound events defined, we often leverage them to produce new ones. 


[Exercise. Use intersections and the events already defined above to describe 
the event, @, in our ten qubit coin flip in which both a) the Hamming weight has 
absolute value > 5 and b) the sum of the 1s is odd. 


413 


Differences 


What if we wanted to discuss the ten qubit coin flip event, Y, which had odd sums, 
but whose first four flips are not equal? We could leverage our definitions of @ and 


F and use difference notation, “—” or “\”, like so: 
YQ=s=CO-F or 
Di MF 


That notation instructs the reader to start with @, then remove all the events that 
satisfy (or are €) F. 


Complements and the = Operator 


When we start with the entire sample space 2, and subtract and event, 7, 
Q-— Sf, 


we use a special term and notation: the complement of %, written in several ways, 
depending on the author or context, 


eae 


All are usually read “not Y”, and the last reprises the logical negation operator, 7. 


14.5 Alternate Views of the Ten Qubit Coin Flip 


This is a good time to introduce other ways to view the outcomes of this ten qubit 
experiment that will help us when we get to some quantum algorithms later in the 
course. 


A 10-Dimensional Space With Coordinates 0 and 1 


For each outcome, 
(Xo, L1, LQ, -+ + £9), 


we look at it as a ten component vector whose coordinates are 0 or 1. The vectors 
can be written in either row or column form, 


XO 
Ly 
XQ 


(Xo, U1, TQ, +++, £9) 


x9 


414 


An outcome is already in a vector-like format so it’s not hard to see the correspon- 
dence. For example, the outcome (0, 1, 0, ..., 1) corresponds to the vector 


0 
1 
0 


1 


This seems pretty natural, but as usual, I’ll complicate matters by introducing the 
addition of two such vectors. While adding vectors is nothing new, the concept 
doesn’t seem to have much meaning when you think of them as outcomes. (What 
does it mean to “add two outcomes of an experiment?”) Let’s not dwell on that for 
the moment but proceed with the definition. We add vectors in this space by taking 
their component-wise mod-2 sum ® or, equivalently, their component-wise XOR, 


0 1 001 1 
1 0 100 1 
o}] 4 Ji] = |oe@i] — fi 
1 1 Lat 0 


To make this a vector space, I’d have to tell you its scalars, (just the two numbers 
0 and 1), operations on the scalars (simple multiplication, “-” and mod-2 addition, 
“®”), etc. Once the details were filled in we would have the “ten dimensional vectors 
mod-2,” or (Z2)'°. The fancy name expresses the fact that the vectors have 10 
components (the superscript 10 in (Z2)*°), in which each component comes from the 
set {0, 1} (the subscript 2 in (Z2)'°). You might recall from our formal treatment of 
classical bits that we had a two dimensional mod-2 vector space, B = B?, which in 
this new notation is (Ze)?. 


We can create mod-2 vectors of any dimension, of course, like the five-dimensional 
(Z)° or more general n-dimensional (Z2)". The number of components 10, 5 or n, 
tells you the dimension of the vector space. 


The Integers from 0 to 2!° — 1 


A second view of a ten qubit coin flip is that of a ten bit integer from 0 to 1023, 
constructed by concatenating all the results, 


Uo, LQ°** Lg. 


415 


Examples of the correspondence between these last two views are: 


mod-2 vector binary 
(0, 0, ... 0,0,0) <= 00--- 000 
(0, 0, ...0,0,1) © 00--- 001 
(0, 0,...0,1,0) © 00--- 010 
(0,0,...0,1,1) © 00--- 011 
(0, 0, ...1,0,0) © 00--- 100 
(%0,%1,%2,... fo) XoXo, --° 


This gives us additional vocabulary to describe events. 


interesting event-descriptions are: 


X9 


integer 
o 0 
o 1 
o> 2 
o> 3 
o> 4 
+ x 


A few examples of some 


e All outcomes between 500 and 700. (Uses integer interpretation.) 


All outcomes divisible by 7. (Uses integer interpretation.) 


e All outcomes which are relatively prime (a.k.a. coprime) to a given number. 


(Uses integer interpretation.) 


three vectors 


All outcomes that are linearly independent of the set (event) containing the 


1 (Oy eh OP OS Dy Ope D150), (Oy cinls Oa): 


(Uses vector interpretation.) See below, for a review of linear independence. 


The last two are of particular importance when we consider quantum period-finding. 


14.6 Mod-2 Vectors in Probability Computations 
for Quantum Algorithms 


In order to manufacture some particularly important examples for our upcoming 
algorithms, we will take a moment to shore-up our concept of linear independence, 


especially as it pertains to the mod-2 vector space (Z2)”. 


14.6.1 Ordinary Linear Independence and Spanning 


Definition of Linear Independence (Review) 


416 


Linear Independence. I[n a vector space, a set of vectors {vo, ..., Vn—i} 
is linearly independent if you cannot form a non-trivial linear combi- 
nation of them which produces the zero vector, 1.€., 
CoVo + C1Vv1 ae Oecd a Cn—1Vn—-1 = 0 
— 
allc¢, = 0. 
Another way to say this is that no vector in the set can be expressed as a linear- 
combination of the others. 
[Exercise. Prove that the last statement is equivalent to the definition.| 


[Exercise. Show that the zero vector can never be a member of a linearly inde- 
pendent set.| 


[Exercise. Show that a singleton (a set consisting of any single non-zero vector) 
is a linearly independent set.| 


Definition of Span of a Set of Vectors (Review) 


The span of a set of vectors is the (usually larger) set consisting of all vectors that 
can be constructed by taking linear combinations of the original set. 


The Span of a Set of Vectors. The span of a set of m vectors S = 
{vi} oy as 
{ covo + Cyvy +. ... + Cm—1Vm-_1 | Ce are sealars } 


The set S does not have to be a linearly independent set. If it is not, then it means 
we can omit one or more of its vectors without reducing its span. 


[Exercise. Prove it.] 


When we say that a vector, w, is in the span of S, we mean that w can be written 
as a linear combination of the v,s in S. Again, this does not require that the original 
{v;} be linearly independent. 


[Exercise. Make this last definition explicit using formulas. ] 


When a vector, w, is not in the span of S, adding w to S will increase S’s span. 


14.6.2 Linear Independence and Spanning in the Mod-2 Sense 


Because the only scalars available to “weight” each vector in a mod-2 linear combi- 
nation are 0 and 1, the span of any set of mod-2 vectors reduces to simple sums. 


The Span of a Set of Mod-2 Vectors. The span of a set of m mod-2 


vectors S = ive hes 18 


{ 1 + Vi, Se Jie Se Vi, vn ESS. 


417 


Abstract Example 


Say we have four mod-2 vectors (of any dimension), 
S = {V¥o, Vi; V2, V3}. 


Because the scalars are only 0 or 1, vectors in the span are all possible sums of these 
vectors. If one of the vectors doesn’t appear in a sum, that’s just a way to say its 
corresponding weighting scalar is 0. If it does appear, then its weighting scalar is 1. 
Here are some vectors in the span of S: 


0 (the zero vector) 


V2 

Vo +r V3 

Vo Vi | V2 | V3 

vi + vi (= O, also the zero vector) 


[Exercise. For a mod-2 vector space, how many vectors are in the span of the 
empty set, 0’? How many are in the span of {0}? How many are in the span of a set 
consisting of a single (specific) non-zero vector? How many are in the span of a set 
of two (specific) linearly independent vectors? Bonus: How many in the span of 
a set of m (specific) linearly independent vectors? Hint: If you’re stuck, the next 
examples will help sort things out.| 


Concrete Example 
Consider the two five-dimensional vectors in (Z2)° 
CO Oe Leela tO 
Their span is all possible sums (and let’s not forget to include the zero vector): 


0 = (0,05.0,0;0) 

(he O11) 

ee a ae re 

CLO. Ge), 42 (OO) = tO OT) 


Two Linearly Independent Vectors 


A set of two mod-2 vectors, {vo, vi} (of any dimension, say 10), are linearly inde- 
pendent in the mod-2 sense when 


e neither vector is zero (0), and 
e the two vectors are distinct, i.e., vo # V1. 


[Exercise. Prove that this follows from the definition of linear independence using 
the observation that the only scalars available are 0 and 1.] 


418 


Finding a Third Independent Vector Relative to an Independent Set of 
Two 


Consider the same two five-dimensional vectors in (Z2)° 
a a Oa Spt gloat 0 Po 


This set is linearly independent as we know from the above observations. If we wanted 
to add a third vector, w, to this set such that the augmented set of three vectors, 


tC tO, Orel tL). al, leeds awe ys 


would also be linearly independent, what would w have to look like? It is easiest 
to describe the zlegal vectors, i.e., those w which are linearly dependent on the two, 
then make sure we avoid those. For w to be linearly dependent on these two, it would 
have to be in their span. We already computed the span and found it to be the four 
vectors 


0, 0, 0, 0, 0), 
Te esses 
i a 
et eg ee 


( ) 
( ), 
( ) and 
( ) 


Thus, a vector w is linearly independent of the original two exactly when it is not in 
that set, 


We. Oe (TOPO) Te SATS 1, AO), (0a, Os Tis 


or taking the complement, when 


w ¢€ {0, (1,0,0,1,0, @,1,1,1,0, (0,1,1,0, D}. 


(I used the over-bar, &, rather than the &° notation to denote complement, since the 
“ tends to get lost in this situation.) 


How many vectors is this? Count all the vectors in the space (2° = 32) and 
subtract the four that we know to be linearly dependent. That makes 32 — 4 = 28 
such w independent of the original two. 


Discussion. This analysis didn’t depend on which two vectors were in the original 
linearly independent set. If we had started with any two distinct non-zero vectors, 
they would be independent and there would be 28 ways to extend them to a set of 
three independent vectors. Furthermore, the only role played by the dimension 5 was 
that we subtracted the 4 from 2° = 32 to get 28. If we had been working in the 
7-dimensional (Z2)’ and asked the same question starting with two specific vectors, 
we would have arrived at the conclusion that there were 2’ — 4 = 128 — 4 = 124 
vectors independent of the first two. If we started with two linearly independent 10- 
dimensional vectors, we would have gotten 2!°—4 = 1024—4 = 1020 choices. And if we 


419 


started in the space of 3-dimensional vectors, (Z2)°, there would be 23-4 = 8—-4=4 
independent w from which to choose. 


[Exercise. If we had two independent vectors in the 2-dimensional (Z,)?, how 
many ways would there have been to select a third vector linearly dependent from 
the first two?] 


14.6.3. Probability Warm-Up: Counting 


We’re trying to learn how to count, a skill that every combinatorial mathematician 
needs to master, and one that even we computer scientists would do well to develop. 


Finding a Third Vector that is Not in the Span of Two Random Vectors 


Let’s describe an event that is more general than the one in the last example. 


& = the event that, after selecting two 5-dimensional mod-2 vectors 
x and y at random, a third selection w will not be in the span of the first 
two. 


Notation. Since we are selecting vectors in a specific sequence, we’ll use the 
notation 


(x, y) 


to represent the event where x is the first pick and y is the second pick. (To count 
accurately, we must consider order, which is why we don’t use braces: “{” or “}.”) 
Similarly, the selection of the third w after the first two could be represented by 


(x, y, w). 


How many outcomes are in event &? 


This time, the first two vectors are not a fixed pair, nor are they required to be 
linearly independent. We simply want to know how many ways the third vector (or 
“qubit coin flip” if you are remembering how these vectors arose) can avoid being in 
the span of the first two. No other restrictions. 


We break & into two major cases: 
1. x and y form a linearly independent set (mostly answered), and 
2. x and y are not linearly independent. This case contains three sub-cases: 


(i) Both x and y are 0, 
(ii) exactly one of x and y is 0, and 


(iii) x = y, but neither is 0. 


420 


Let’s do the larger case 2 first, then come back and finish up what we started 
earlier to handle case 1. 


Harder Case 2. For x and y not linearly independent, we count each sub-case 
as follows. 


(i) There is only one configuration of x and y in this sub-case, namely (0, 0 i" In 
such a situation, the only thing we require of w is that it not be 0. There are 
32 — 1 = 31 such w. Therefore, there are 31 simple events, (0, 0, w), in this 
case. 


(ii) In this sub-case there are 31 configurations of the form (0, Yay and 31 of 
the form (as a) es 
can lead us to this sub-case. Meanwhile, for each such configuration there are 
32 — 2 = 30 ways for w to be different from x and y. Putting it together, there 
are 62 - 30 = 1860 simple events in this sub-case. 


That’s a total of 62 ways that random choices of x and y 


(iii) There are 31 configurations of x = y # 0 in this sub-case, namely (x: x) 76: 
Meanwhile, for each such configuration any w that is neither 0 nor x will work. 
There are 32 — 2 = 30 such w. Putting it together, there are 31 - 30 = 930 
simple events in this sub-case. 


Summarizing, the number of events with x and y not linearly independent and w not 
in their span is 31 + 1860 + 930 = 2821. 


Easier Case 1. We get into this situation with (x, 2 apa That means the 
first choice can’t be O so there are 31 possibilities for x, and the second one can’t be 
0 or x, so there are 30 choices left for y. That’s 31-30 ways to get into this major 
case. Meanwhile, there are 28 ws not in the span for each of those individual outcomes 
(result of last section), providing the linearly-independent case with 31-30-28 = 26040 


simple events. 


Combining Both Cases. We add the two major cases to get 2821 + 26040 = 
28861 outcomes in event &. If you are thinking of this as three five qubit coin flip 
outcomes — 15 individual flips, total — there are 28861 ways in which the third group 
of five will be linearly independent of the first two. 


Is this likely to happen? How many possible flips are there in the sample space 
Q? The answer is that there are 2)° (or, if you prefer, 32°) = 32768. (The latter 
expression comes from 3 five qubit events, each event coming from a set of 2° = 32 
outcomes.) That means 28861 of the 32768 possible outcomes will result in the third 
outcome not being in the span of the first two. A simple division tells us that this 
will happen 88% of the time. 


A third vector selected at random from (Zz)° is far more likely to be in- 
dependent of any two previously selected vectors than it 1s to be in their 
span. 


This was a seat-of-the-pants calculation. We’d better get some methodology so 
we don’t have to work so hard every time we need to count. 


A421 


14.7 Fundamental Probability Theory 


14.7.1 The Axioms 
Consider the set of all possible events of an experiment, 
Events = {&}. 


A probability measure of an experiment is a function, P(), with domain = Events and 
range = non-negative real numbers, 


P: Events —> Rso, 


which satisfies the following three axioms: 


1. All P() values are non-negative (already stated), 


P(€) > 0. 


2. The probabilities of something happening is certain, 


P(Q) = 1. 


3. The probabilities of mutually exclusive (disjoint) events, &,...,én—-1, can be 
added, 


k=0 


n—-1 n-1 
P (U ‘| = JS P(&). 
k=0 
[Exercise. Prove the following consequences from the above axioms: 
- For any event, P(&) < 1. 


PO) = 0. 
-IE CF, P(&)< P(F). | 


14.7.2 Assigning Probabilities in Finite, Equiprobable, Sam- 
ple Spaces 


Definitions 


It may not always be obvious how to assign probabilities to events even when they 
are simple events. However, when the sample space is finite and all simple events are 
equiprobable, we can always do it. We just count and divide. 


422 


Caution. There is no way to prove that simple events are equiprobable. This 
is something we deduce by experiment. For example, the probability of a coin flip 
coming up tails (or the z-spin measurement of an electron in state |0),, being “1”) is 
said to be .5, but we don’t know that it is for sure. We conclude it to be so by doing 
lots of experiments. 


Size of an Event. The size of an event, written |&|, is the number of 
outcomes that constitute it, 


|| = #outcomes € @. 
Probability of an Event. The probability of an event, &, when all 
simple events are equiprobable, is 


Ie"| 


P(é) = jQ) 


This is not a definition, but a consequence of the axioms plus the assumption that 
simple events are finite and equiprobable. 
Example 1 


In the ten qubit coin flip, consider the event, , in which the first four results are 
all equal. We compute the probability by counting how many simple events (or, 
equivalently, how many outcomes) meet that criterion. What we know about these 
events is that the first four are the same, so they are either all 0 or all 1. 


All 0. The number of events in this case is the number of ten-tuples of the 
form, 


(0, 0, 0, 0, Xo, V1, Coils 


which we can see is the same as the number of integers of the form %921%2%3% 425. 
That’s 000000 through 111111, or 0 > 638 = 64. 


All 1. The number of events in this case is the number of ten-tuples of the 
form, 


(1, iF 1, i, Xo, V1, vaio 5.) 
also equal to 64. 


So the number of outcomes in this event is 128. Meanwhile the sample space has 
1024 outcomes, giving 


Example 2 


Next, consider the event F in which the first four individual flips come up the same 
(= “Example 1”), but with the additional constraint that the fifth qubit be different 
from those four. Now, the outcomes fall into the two categories 


Four Os followed by a 1. 


(0, 0, 0, 0, ee v1, ee 


Four 1s followed by a 0. 


iL dy i 1, 0, T, V1, greg ha’) 5 


Again, we look at the number of possibilities for the “free range” bits xo, ..., 2a, 
which is 32 for each of the two categories, making the number of outcomes in this 
event 32 + 32 = 64, so the probability becomes 


> 64 


Example 3 


We do a five qubit coin flip twice. That is, we measure five quantum states once, 
producing a mod-2 vector, x = (Xo, 11, ..., £4), then repeat, getting a second vector, 
y = (Yo, Yi, ---, ya). It’s like doing one ten qubit coin flip, but we are organizing 
things naturally into two equal parts. Instead of the outcomes being single vectors 
with ten mod-2 components, outcomes are pairs of vectors, each member of the pair 
having five mod-2 components, 


Q = { ( (20, ©, teyta), (Yor Hy oe) | rr, yr © {0, uy } 


or, more concisely, 
Q = { (x,y) | x,y € (Za)? } 


Consider the event, -%, in which the two vectors form a linearly independent set. 


We've already discussed the exact conditions for a set of two mod-2 vectors to be 
linearly independent: 


e neither vector is zero (0), and 


e the two vectors are distinct, ie., x # y. 


We compute P(.%) by counting events. For an outcome (x, y) € -%, x can be 
any non-O five-tuple, (x9, 41, ..., %4), and y must be different from both O and x. 


A424 


That makes 31 vectors x # 0, each supporting 30 linearly independent ys. That’s 
31 x 30 = 930 outcomes in ¥. |Q| continues to be 1024, so 


|F¥| 930 


Nae JQ) SSs«i1024 


908 . 


As you can see, it is very likely that two five qubit coin flips will produce a linearly 
independent set; it happens > 90% of the time. 


(Exercise. Revise the above example so that we take three, rather than two, five 
qubit coin flips. Now the sample space is all triples of these five-tuples, (x, y, w). 
What is the probability of the event, 7, that all three “flip-tuples” are linearly 
independent? Hint: We already covered the more lenient case in which x and y were 
allowed to be any two vectors and w was not in their span. Repeat that analysis but 
exclude the cases where x and y formed a linearly dependent set. | 


[Exercise. Make up three interesting event descriptions in this experiment and 
compute the probability of each.] 


14.8 Big Theorems and Consequences of the Prob- 
ability Axioms 


14.8.1 Unions 


The third axiom tells us about the probability of unions of disjoint events, {&}, 
namely, 


What happens when the events are not mutually exclusive? A simple diagram in the 
case of two events tells the story. If we were to add the probabilities of two events 


& F 


& and ¥ which had non-empty intersection, we would be counting the intersection 
twice. To fix the error we just subtract off the probability of the intersection, 


P(@UF) = P(&) + P(F) — P(ENF). 


A425 


When there are more than two sets, the intersections involve more combinations and 
get harder to write out. But the concept is the same, and all we need to know is that 


n—-1 n-1 
P (U ‘.) = S- P(&,) — P (various intersections) . 
k=0 


k=0 


The reason this is always enough information is that we will be using the formula to 
bound the probability from above, so the equation, as vague as it is, clearly implies 


(Us s SP (&) 


k=0 k=0 


14.8.2 Conditional Probability and Bayes’ Law 
Conditional Probability 


Very often we will want to know the probability of some event, &, under the as- 
sumption that another event, -¥, is true. For example, we might want to know the 
probability that three quantum coin flips are linearly independent under the assump- 
tion that the first two are (known to be) linearly independent. The notation for the 
event “& given #” is 


8| EF, 
and the notation for the event’s probability is 
P(@|F). 


This is something we can count using common sense. Start with our formula for an 
event in a finite sample space of equiprobable simple events (always the setting for 
us), 


Ie 


P(é) = ja] 


Next, think about what it means to say “under the assumption that another event, 
F, is true.” It means that our sample space, 1, suddenly shrinks to ¥, and in that 
smaller sample space, we are interested in the probability of the event & 9 F, so 


IENF| 


P(&é|F) Fy 


, whenever ¥ # 0. 


Of course, if Y = — everything is zero so there is no formula needed. 


426 


Bayes’ Law 


Study this last expression until it makes sense. Once you have it, divide the top 
and bottom by the size of our original sample space, |Q|, to produce the equivalent 
identity, 


P(ENF 
PLC|H ). = ares whenever P(.F) > 0. 


This is often taken to be the definition of conditional probability, but we can view 
it as a natural consequence of the meaning of the phrase “& given -¥.” It is also a 
simplified form of Bayes’ law, and I will often refer to this as Bayes’ law (or rule or 
formula), since this simple version is all we will ever need. 


14.8.3 Statistical Independence 
Two Independent Events 


Intuitively, two events are statistically independent — or simply independent — if they 
don’t affect one another. If one is known to be true, the probability of the other is 
unaffected. Stated in terms of conditional probabilities, this is nothing more than the 
declaration that 


PCA |e): =. Pi): 
or, looking at the mirror image, 

PUA |e) = PUA) 
If we substitute the first of these into Bayes’ law we find 


P(ENF 
Bie: ee whenever P(.F) > 0, 


and rearranging gets us 
PCT? ) si RL PL a |e 


which, by the way, is true even for the degenerate case, ¥ = (). This is the official 
definition of statistically independent events, and working backwards you would derive 
the intuitive meaning that we started with. In words, two events are independent = 


“the probability of the intersection is equal to the product of the probabilities.” 


Multiple Independent Events 


The idea carries over to any number of events, although the notation becomes thorny. 
It’s easier to first say it in words, then show the formula. In words, 


A427 


n events are independent 
=> 
“the probability of the intersection [of any subset of the n events] is 
equal to the product of the probabilities [of events in that subset].” 


Formulaically, we have to resort to double indexing. 


n events, {& ae are independent 


= 


P| () &| = JTL PG). 


1l<kg <ky<- 1l<kg <ky<-- 
Sky cn kp <n 


14.8.4 Other Forms and Examples 


I'll list a few useful consequences of the definitions and the formulas derived above. 
Unless otherwise noted, they are all easy to prove and you can select any of these as 
an exercise. 


Events 
(CARA |S) =! SES Age 
EN(FUY) = (ENF) U (ENF) 
Probabilities 
P(@) = P(@NF) + P(ENF*) 
P(ENF) = P(&|F)P(F) 
Example 1 


In our ten qubit coin flip, what is the probability that the 3rd, 6th and 9th flips are 
identical? 
We'll call the event &. It is the union of two disjoint events, the first requiring 
that all three flips be 0, &, and the second requiring that all three be 1, @, 
oS le wih F¥ NnG=9 
=> 
P(é) = P(P@y + P(e). 


428 


There is no mathematical difference between & and @, so we compute P() of either 
one, say @. It is the intersection of the three statistically independent events, namely 
that the 3rd, 6th and 9th flips, individually, come up 1, which we call @3, @@ and 
Oy, respectively. The probability of each is, of course, .5, so we get, 


P(@) = P(63N6,N GQ) 
= P(63)P (65) P(@). 
0°.0°.0 =.125 
Plugging into the sum, we get 
P(€) = P(#) + P(@) = 125+ .125 =.25. 


Example 2 


We do the five qubit coin flip five times. We examine the probability of the event 4%; 
defined as “the five five-tuples are linearly independent.” Our idea is to write P (-4%5) 
as a product of conditional probabilities. Let 


J; = event that the vector outcomes 
1 ED” gt ea 
are linearly-independent. 
Our goal is to compute the probability of -%. 
(It will now be convenient to use the over-bar notation & to denote complement.) 
Combining the basic identity, 
P(I5) = Pin AH) 
+ P(A G) 
(exercise: why?), with the observation that 
PCIe) = 0 
(exercise: why?), we can write 
PIS) = PUAN) 
= P(Is|H) P(A) 
(exercise: why’). 
Now apply that same process to the right-most factor repeatedly, and you end up 
with 
P( 45) 
= PC 4|%) P(A|%) 
P( 4/4) P(A|A)P(A) 


=. Plea): 


j=l 


A429 


where, the 7 = 1 term contains the curious event, .%. This corresponds to no coin-flip 
= no vector being selected or tested. It’s different from the zero vector = (0, 0,0, 0,0)‘, 
which is an actual flip possibility; .% is no flip at all, i.e., the empty set, 0. But we 
know that @ is linearly independent always because it vacuously satisfies the condition 
that one cannot produce the zero vector as a linear combination of vectors from the 
set — since there are no vectors to combine. So P(.%) = 1. Thus, the last factor in 
the product is just 


PF: |%o). = P(A), 


but it’s cleaner to include the conditional probability (LHS of above) in all factors 
when using product notation. 


[Note. We'll be computing the individual factors in Simon’s algorithm. This is 
as far as we need to take it today.| 


14.9 Wedge and Vee Notation and Lecture Recap 


Some texts and papers often use the alternate notation, A instead of M for inter- 
sections, and V instead of U for unions. When A and V are used, — is typically the 
negation (complement) operator of choice, so the three go together. This is seen more 
commonly in electrical engineering and computer science than math and physics. I 
introduced the concepts in this lesson using more traditional N,U, and © notation but 
we'll use both in upcoming lectures. Normally, I will reserve 4, U, and © for non-event 
sets (like sets of integers) and A, V, and — for events. 


We'll take this opportunity to repeat some of the more important results needed 
in future lectures using the new notation. 


14.9.1 Disjoint Events 


The probabilities of mutually exclusive (disjoint) events, 6,...,é@n—1, can be added, 
n-1 n-1 
(Vai = SoP(&). 
k=0 k=0 
14.9.2 Partition of the Sample Space by Complements. 


Any event, ¥, and its complement, 4.¥, partition the space, leading to the identity 


PLE) = PLR A) 
= P(é A AF ). 


430 


14.9.3. Bayes’ Law. 


Bayes’ law (or rule or formula) in its simple, special case, can be expressed as 
P(EAF) 
P(F) 


P(é|F) , whenever P(.F) > 0. 


14.9.4 Statistical Independence 


n events, {&,}"2), are statistically independent & 
k=0 


P (A &:) = |] PG) 
for any subset of the indexes, {k,;} C {0,1,...,n—1}. 


This completes the probability theory results needed for the Foothill College 
courses in Quantum computing. You are now fully certified to study the statisti- 
cal aspects of the algorithms. 


14.10 Application to Deutsch-Jozsa 


In a recent lecture, I gave you an informal argument that the Deutsch-Jozsa problem 
had a constant time solution using a classical algorithm if we accept a small but 
constant error. We now have the machinery to give a rigorous derivation and also 
show two different sampling strategies. It is a bit heavy handed to apply such rigorous 
machinery to what seemed to be a relatively obvious result, but the practice that we 
get provides a good warm-up for times when the answers are not so obvious. Such 
times await in our upcoming lessons on Simon’s and Shor’s algorithms. 


First, a summary of the classical Deutsch-Jozsa algorithm: 


The M-and-Guess Algorithm. Let M be some positive integer. Given a 
Boolean function, f(x) of « € [0, nm — 1] which is either balanced or constant, we 
evaluate f(x) M times, each time at a random x € [0, n—1]. If we get two different 
outputs, f(x’) A f(#”), we declare f balanced. If we get the same output for all 
trials, we declare f constant. 


The only way we fail is if the function is balanced (event #) yet all M trials 
produce the same output (event .”). This is the event 


SNB. 


whose probability we must compute. 


Assumption. Since we only care about the length of the algorithm as the number 
of possible encoded inputs, 2”, gets very large, there is no loss of generality if we 
assume M < 2”. To be sure, we can adjust the algorithm so that in those cases in 
which M > 2” we sample f non-randomly at all 2” input values. In those cases we 
will know f completely after the M trials and will have zero error. It is only when 
M < 2” that we have to do the analysis. 


431 


14.10.1 Sampling with Replacement 


The way the algorithm is stated, we are admitting the possibility of randomly selecting 
the same x more than once during our M trials. This has two consequences in the 
balanced case — the only case that could lead to error: 


1. If we have a balanced function, it results in a very simple and consistent prob- 


ability for obtaining a 1 (or 0) in any single trial, namely P = 5. 


2. It is a worst case scenario. It produces a larger error than we would get if we 
were to recast the algorithm to be smarter. So, if we can guarantee a small 
constant error for all n in this case it will be even better if we improve the 
algorithm slightly. 


The algorithm as stated uses a “sampling with replacement” technique and is the 
version that I summarized in the original presentation. We’ll dispatch that rigorously 
first and move on to a smarter algorithm in the section that follows. 


14.10.2 Analysis Given a Balanced f 


The experiment consists of sampling f M times then announcing our conclusion: 
balanced or constant. We wish to compute the probability of error, which we take 
to mean the unlucky combination of being handed a balanced function yet observing 
either all Os or all 1s in our M samples. Before attacking the desired probability of 
error, 


P(S AB), 


we consider the case in which we are given a balanced function — the only way we 
could fail. The probability to be computed in that case is expressed by 


P(A): 


So we first do our counting under the assumption that we have a balanced function. 
In this context, we are not asking about a “wrong guess,” since under this assumption 
there is no guessing going on; we know we have a balanced function. Rather, we are 
computing the probability that, given a balanced function, we get M identical results 
in our M trials. 


Another way to approach this preliminary computation is to declare our sample 
space, Q, to be all experimental outcomes based on a balanced function. Shining the 
light on this interpretation, we get to leave out the phrase “given that f is balanced” 
throughout this section. Our choice sample space implies it. 


432 


Notation for Outcomes Yielding a One 


Let @, be the event that [we observe f(x) = 1 on the kth trial]. (Script @ stands 
for Gne.) Because f is balanced (in this section), 


P(G;) = 


Let @ (no index) be the unlucky event that [we get f(z) = 1 for all M trials, 
k=0,...,M-— ‘il This corresponds to the algebraic identity 


Notation for Outcomes Yielding a Zero 


Likewise, let Z, be the event that [we observe f(x) = 0 on the kth trial]. (Script 
& for Zero.) Of course, we also have 


PIR) = 5. 


Finally, let & (no index) be the event that [we get f(x) =0 for all M trials]. 


The Probability of 1 Identical Readings 


The event -Y% means all outcomes were the Yame, i.e, we got either all 1s or all Os. 
S is the disjoint union, 


SF = CON £ (disjoint), 
so 
PUA) = PlOV 2): = PLO) Pte) 
M-1 M-1 
2P(0) = 2P (1 a = 2 PGs 
k=0 k=0 
The first line uses the mutual exclusivity of @ and &, while the last line relies on 


the statistical independence of the {@;,}. Plugging in 5 for the terms in the product, 
we find that 


ea i 1 


k=0 


433 


14.10.3 Completion of Sampling with Replacement for Un- 
known f 


We might be tempted to say that ( ee is the probability of failure in our M-and- 
guess algorithm, but that would be rash. We computed it under the assumption of 
a balanced f. To see the error clearly, imagine that our function provider gives us a 
balanced function only 1% of the time. For the other 99% of the functions, when we 
guess “constant” we will be doing so on a constant function and will be correct; our 
chances of being wrong in this case are diminished to 


To see how this comes out of our theory, we give the correct expression for a wrong 
guess. Let W be the event consisting of a wrong guess. It happens when both f is 
balanced (4) AND we get M identical results (A), 


Bo BSF RA. 
Bayes’ law gives us the proper path, 
BW) = PARA) = PCArPoe)- 
It was actually, P (|B), not P(W), that we computed in the last section, so we 


plug in that number, 


Pw) = (aa) P@), 


and now understand that the probability of being wrong is attenuated by P(¥), 
which is usually taken to be 1/2 due to fair sampling. But understand that the world 
is not always fair, and we may be given more of one kind of function than the other. 
Using the usual value for P (¥), this yields the proper probability of guessing wrong, 


ror) = (ahs) (2) = a 


There are many different versions of how one samples the function in a classical 
Deutsche-Jozsa solution, so other mechanisms might yield a different estimate. How- 
ever, they are all used as upper bounds and all give the same end result: classical 
methods can get the right answer with exponentially small error in constant time. 


14.10.4 Sampling without Replacement 
Now, let’s make the obvious improvement of avoiding duplicate evaluations of f for 


the randomly selected x. Once we generate a random x, we somehow take it out 
of contention for the next random selection. (I won’t go into the details on how we 


434 


guarantee this, but you can surmise that it will only add a constant time penalty — 
something independent of n — to the algorithm.) 


[Exercise. Write an algorithm that produces M distinct x values at random. 
You can assume a random number generator that returns many more than M distinct 
values (with possibly some repeats), since a typical random number generator will 
return at least 32,768 different ints, while M is on the order of 20 or 50. Hint: It’s 
okay if your algorithm depends on M, since M is not a function of n. It could even 
be on the order of M? or M®, so long as it does not rely on n.| 


Intuitively, this should reduce the error because every time we remove another x 
whose f(x) = 1 from our pot, there are fewer xs capable of leading to that 1, while 
the full complement of = xs that give f(x) = 0 are still present. Thus, chances of 
getting f(#) = 1 on future draws should diminish each trial from the orginal value of 
S. We prove this by doing a careful count. 


Analysis Given a Balanced f 


As before, we will do the heavy lifting assuming the sample space Q = &, and when 
we’re done we can just multiply by 1/2 (which assumes equally likely reception of 
balanced vs. constant function.). 


M = 2 Samples 


The {@;,} are no longer independent. To see how to deal with that, we’ll look at 
M = 2 which only contains the two events Op and 0. 


The first trial is the easiest. Clearly, 
1 


Incorporating second trial requires conditional probability. We compute with care. 
Our interest is P(@), and 


P(@) = P(@,NGH) = P(O,| G@) P(O). 


What is P (A; | Oo)? We get it by counting. There is one fewer x capable of produc- 
ing f(x) = 1 (specifically, = — 1), and the total number of xs from which to choose 
is also smaller by one, now at 2” — 1, so 


= art] 
P(@, | @) = ea as ey 


making 


yell | 


P(@) = P(@,|6@)P(@) = (==) (5) 


435 


Let’s write 5 in a more complicated — but also more suggestive — way, to give 
gn-l | gn-l 
P(OY = PO AG): = 
(0) = Piano) = (F—*) (2) 


M = 3 Samples 


You may be able to guess what will happen next. Here we go. 


P(@) = P(@ACAGH) = P(O 


[C1 \ Oo] ) P(O, AO). 


We know the value P(@, \ @)), having computed it in the M = 2 case, so direct 
your attention to the factor P( O» | [G1 A Oo] ). The event @, \ Op) means there are 


now two fewer xs left that would cause f(x) = 1, and the total number of xs from 
which to choose now stands at 2” — 2, so 


“2 Doh 
ONG ) = 22 ee ees 
ial 2" — 2 2" — 2 


P(O 


Substituting into our expression for P(@) we get 
gn-l —2 gn-1 = gn-1 
P(@) = P(Q,NQ, NGC = y 
0 = rasan = (52)(S5 () 


The General Case of M Samples 


It should now be clear that after M trials, 


P(@) = P(@y-1\ ++: A@QxK\ OQ, Oo) 


- Geary)” Ger) Ge) ): 


or, more compactly, 


M-1 

gn-l_ 

P(e) = |] eo 
k=0 


This covers the eventually of getting all 1s when f is balanced, and if we include the 
alternate way to get unlucky, all 0s, we have 


P( SY without replacement ) = 2 I] 


436 


Completion of Analysis for Unknown f 


We already know what to do. If we believe we'll be getting about equal numbers of 
balanced and constant functions, we multiply the last result by 1/2, 


@) fest 


M-1 


P(W without replacement ) 


Qr-1_k 
rar, 27 —k 


Bounding the Wrong-Guess Probability 


The probability of guessing wrong in the “with replacement” algorithm (under equal 
likelihood of getting the two types of functions) was 


1 
P(W with replacement ) = 5 


so we want to confirm our suspicion that we have improved our chances of guessing 
correctly. We compare the current case with this past case and ask 


oe 2” —k - nee 2 
Intuitively we already guessed that it must be less, but we can now confirm this with 
hard figures. We simply prove that each term in the left product is less than (or in 
one case, equal to) each term in the right product. 


We want to show that 


a ee < ates 
2" —k — 2 
Now is the time we realize that the old grade school fraction test they taught us from 
our childhood, usually called “cross-multiplication,” 
a C 
—~<-=- S$ ad < ob. 
bb” d ~ 
actually has some use. We apply it by asking 
2 (27 '-k) < *-k? 


The answer is yes. In fact, except for k = 0 where both sides are equal, the LHS is 
strictly less than the RHS. Thus, the product and therefore the probability of error is 
actually smaller in the “without replacement” algorithm. The constant time result of 
the earlier algorithm therefore guaranteed a constant time result here, but we should 
have fewer wrong guesses now. 


437 


14.11 A Condition for Constant Time Complexity 
in Non-Deterministic Algorithms 


This section will turn out to be critically important to our analysis of some quantum 
algorithms, ahead. I’m going to define a few terms here before our official coverage, 
but they’re easy and we’ll give these new terms full air time in future lectures. 


14.11.1 Non-Deterministic Algorithms 


At the start of this lesson, I alluded to a type of algorithm, A, that was non- 
deterministic (or as I sometimes say to avoid the hyphen, probabilistic). This means 
that given any acceptably small error tolerance, ¢ > 0, the probability that A will 
give an inaccurate result is < e. 

We'll say that A is probabilistic with error tolerance é. 

There is a theorem that we can easily state and prove which will help us in our 
future quantum algorithms. It gives a simple condition that a probabilistic algorithm 
will succeed in constant time. First, a few more definitions. 


14.11.22 Preview of Time Complexity — An Algorithm’s De- 
pendence on Size 


Assume an algorithm, A, is potentially dependent on an integer, N, that describes 
the size of the problem to be solved. 


Example 1 


A might be an algorithm to sort the data and N is the number of data records to be 
sorted. 


Example 2 


A might be an algorithm to determine whether a function, f, is balanced and N is 
the number of inputs to the function. 


We'll say that the algorithm A has size N. 


This does not mean that it will take N units of time for A to execute, or even 
that its execution time is dependent in any way on N; it only means that A solves 
a problem associated with a certain number, N, of data elements or inputs, and N 
is allowed to take on an increasingly large value, making the algorithm potentially 
more difficult (or not). A might complete very quickly, independent of its size, or it 
might take longer and longer to complete, as its size, N, grows. 


438 


14.11.3 Looping Algorithms 


Often in quantum computing, we have an algorithm that repeats an identical mea- 
surement (test, experiment) in a loop, and that measurement can be categorized in 
one of two ways: success (.”) or failure (.F). Assume that a measurement (test, ex- 
periment) only need succeed one time in any of the loop passes to end the algorithm 
with a declaration of victory: total success. Only if it fails after all loop passes is the 
algorithm considered to have failed. Finally, the events .Y and ¥ for any one loop 
pass are usually statistically independent of the outcomes on previous loop passes, a 
condition we will assume is met. 


We'll say that A is a looping algorithm. 


14.11.4 Probabilistic Algorithms with Constant Time Com- 
plexity 


Say we have a looping algorithm A of size N that is probabilistic and can be shown 
to complete with the desired confidence (error tolerance) in a fixed number of loop 
passes, 7’, where T’ is independent of N. 


We'll call A a constant time algorithm. 


14.11.5 A Constant Time Condition for Looping Algorithms 


We can now state the theorem, which Ill call the CTC theorem for looping algorithms 
(CTC for “constant time complexity”). By the way, this is my terminology. Don’t 
try to use it in mixed company. 


The CTC Theorem for Looping Algorithms. Assume that A is a 
probabilistic, looping algorithm having size N. If we can show that the 
probability of success for a single loop pass is bounded away from zero, 
1.€., 


PEA): fo? Se 0G 


with p independent of the size N, then A is a constant time algorithm. 


Proof. Let .~% be the event of success in the kth loop pass. The hypothesis is 
that 


PUA) 2 py tor allée > 1. 
We are allowing the algorithm to have an error with probability ¢. Pick T’ such that 


(1—p)* < €, 


439 


a condition we can guarantee for large enough T since (1—p) < 1. Note that p being 
independent of the size N implies that T is also independent of N. After having 
established 7, we repeat A’s loop T times. The event of failure of our algorithm at 
the completion of all JT’ loop passes, which we'll call -F;,., can be expressed in terms 
of the individual loop pass failures, 


Frit = (7A) A (7A) A oe A (ASF). 
Since the events are statistically independent, we can convert this to a probability 
using a simple product, 
PUR a) = PIAA) (Aw) see Pez) 
= (l-p)? < «. 
We have shown that we can get A to succeed with failure probability < ¢ if we allow 


A’s loop to proceed a fixed number of times, 7’, independent of its size, N. This is 
the definition of constant time complexity. QED 


Explicit Formula for T 


To solve for T' explicitly, we turn our condition on the integer T into an equality on 
a real number t, 


(1 _ p)’ = €, 
solve for t by taking the log,_,, of both sides, 


t = logi_» (e) ’ 


then pick any integer 7’ > t. Of course, taking a log having a non-standard base like 
1—p, which is some real number between 0 and 1, is not usually a calculator-friendly 
proposition; calculators, not to mention programming language math APIs, tend to 
give us the option of only log, or log;y. No problem, because ... 


[Exercise. Show that 

log x 

log A’ 

where logs on the RHS are both base 2, both base 10, or both any other base for that 
matter.| 


log,r = 


Using the exercise to make the condition on t a little more palatable, 
log (€) 
log (1 — p)’ 
and combining that with the need for an integer T’ > t, we offer a single formula for 


ap 
os Pert oi 


where |x| is notation for the floor of x, or the greatest integer < a. 


440 


Examples with Two Different ps for Success 


It’s important to realize that we don’t care whether the event of success in each loop 
pass, %, is highly probable or highly improbable. We only care that it is bounded 
away from 0 by a fixed amount, p, independent of the size of the algorithm, N. Two 
examples should crystallize this. In both examples, we assume that we would like to 
assure an error probability less than ¢ = .000001 = 10~° — that’s one in a million. 
How many times do we have to loop? 


Example 1. P(.%) = .002 (Very Improbable). We’ll use log;9, since my 
calculator likes that. 


T = log (107°) |, = —6 are 
~ | log (.998) | ~ | —.00086945871262889 | 
|6900.8452188] + 1 = 6900+ 1 = 6901, 


or, if we want to pick a nice round number, 7000 loop passes. 


Example 2. P(.”%) = .266 (Reasonably Probable). If we have a more 
reasonable chance of success for each measurement in a single pass — a little better 
than 25% — we would expect to get the same level of confidence, error ¢ = 10~°, with 
far fewer loop passes. Let’s see how much faster the algorithm “converges.” 


T = log (10~°) gh. des —6 
log (.734) | —0.134303940083929467 | 
— |44.67478762) + 1 = 4441 = 45, 


or, rounding up, 50 loop passes. 


Although our required number of loop passes is understandably sensitive to the 
desired accuracy of our algorithm, it is not dependent on the amount of data or 
number of inputs on which our algorithm works. The algorithm terminates in constant 
time regardless of the amount of data within each of two cases above. 


44] 


Chapter 15 


Computational Complexity 


15.1 Computational Complexity in Quantum Com- 
puting 


Algorithms are often described by their computational complexity. This is a quantita- 
tive expression of how the algorithm’s time and space requirements grow as the data 
set or problem size grows. We’ve already seen examples where the separation between 
the quantum and classical algorithms favors quantum, but the arguments given at 
the time were a little hand-wavy; we didn’t have formal definitions to lean on. We 
filled in our loosely worded probability explanations by supplying a lesson on basic 
probability theory, and now it’s time to do the same with computational complexity. 


15.2 An Algorithm’s Sensitivity to Size 


We design an algorithm to be applied to an unknown function, a number or a set of 
data. It might determine the period of the function, factor the number or sort the 
data. The number of inputs to the function (or size of the number to be factored 
or amount of data to process) could be anything from one to ten trillion or beyond. 
There are many possible solutions to our problem, but once we settle on one that 
seems good, we ask the question, “how does our algorithms’s running time increase 
as N increases, where N represents the number of inputs (or number to be factored 
or amount of data) on which the algorithm operates?” 


15.2.1 Some Examples of Time Complexity 


The way an algorithm’s running time grows as the size of the problem increases — 
its “growth rate” — is known in scientific disciplines as its teme complexity. To get 
a feel for different growth rates, let’s assume that it acts on data sets (not function 
inputs or a number to be factored). The following do not define the various time 


442 


complexities we are about to study, but they do give you taste of their consequences. 


Constant Time Complexity 


e The algorithm does not depend on the size of the data set, N. It appears to 
terminate in a fixed running time (C seconds) no matter how large N is. Such 
an algorithm is said to have constant time complexity (or be a constant-time 
algorithm). 


Polynomial Time Complexity 


e The algorithm takes C’ seconds to process N data items. We double N and the 
running time seems to double — it takes 2C’ seconds to process 2N items. If we 
apply it to 8N data items, the running time seems to take 8C’ seconds. Here, 
the algorithm exhibits linear time complexity (or be a linear algorithm). 


e The algorithm takes C’ seconds to process N data items. We double N and 
now the running time seems to quadruple — it takes C (2?) = 4C' seconds to 
process 2N items. If we apply it to 8N data items the running time seems to 
take C (8?) = 64C seconds. Now, the algorithm will likely have quadratic time 
complexity (or be a quadratic algorithm). 


e The algorithm takes C’ seconds to process N data items. We double N and 
now the running time seems to increase by a factor of 2? — it takes C (2°) = 8C 
seconds to process 2N items. If we apply it to 8N data items the running time 
seems to take C' (8?) = 512C seconds. Now, the algorithm will likely have cubic 
time complexity. 


The previous examples — constant, linear, quadratic — all fall into the general 
category of polynomial time complexity which includes growth rates limited by some 
fixed power of N (N? for quadratic, N? for cubic, N° for quintic, etc.). 


Non-Polynomial Time Complexity 


Sometimes we can’t find a p such that N?” reflects the growth in time as the data grows. 
We need a different functional form. Examples include logarithmic and exponential 
growth. I won’t give an example of the former here — we’ll define it rigorously in the 
next section. But here’s what exponential growth feels like. 


e The algorithm processes N items in C' seconds (assume JN is large enough that 
C > 1). When we double N the running time seems to takes C? seconds. 
If we apply it to 3N data items the running time takes C® seconds and to 
7N data items the running time takes C’. This is longer/slower/worse than 
polynomial complexity for any polynomial size. Now, the algorithm probably 
has exponential growth). 


443 


(This last example doesn’t describe every exponential growth algorithm, by the 
way, but an algorithm satisfying this for some C’ > 1 would likely be exponen- 
tial.) 


15.2.2 Time Complexity vs. Space Complexity 


I’ve limited our discussion to the realm of time. However, if we were allowed to make 
hardware circuity that grows as the data set grows (or utilize a larger number of 
processors from a very large pool of existing farm of computers) then we may not 
need more time. We would be trading time complexity for space complexity. 


The general term that describes the growth rate of the algorithm in both time 
and space is computational complezxity. 


For the purposes of our course, we’ll usually take the hardware out of the picture 
when measuring complexity and only consider time. There are two reasons for this: 


e Our interest will usually be in relative speed-up of quantum over classical meth- 
ods. For that, we will be using a hardware black box that does the bulk of 
the computation. We will be asking the question, “how much time the quan- 
tum algorithm saves us over the classical algorithm when we use the same — or 
spatially equivalent — black boxes in both regimes?” 


e Even when we take space into consideration, the circuitry in our algorithms for 
this course will grow linearly at worst, (often logarithmically) and we have much 
bigger fish to fry. We’re trying to take a very expensive exponential algorithm 
classically and find a polynomial algorithm using quantum computation. There- 
fore, the linear or logarithmic growth of the hardware will be overpowered by 
the time cost in both cases and therefore ignorable. 


For example the circuit we used for both Deutsch-Josza and Bernstein- Vazirana, 


0)" —4 He" ee A 
Us Q 
|1) H (ignore) 


had n+ 1 inputs (and outputs), so it grows linearly with n (and only logarithmically 
with the encoded N = 2”). Furthermore, since this is true for both classical and 
quantum algorithms, such growth can be ignored when we compare the two regimes. 


15.2.3 Notation 


To kick things off, we establish the following symbolism for the time taken by an 
algorithm to deal with a problem of size N (where you can continue to think of N as 


444 


the amount of data, while keeping in mind it might be the number of inputs or the 
size of a number to be factored). 


Ta(N) = _ time required by algorithm Q to process N elements. 


We now explore ways to quantify Ta(N). 


15.3. Big-O Growth 


15.3.1 Conflicting Ways to Measure an Algorithm 


When you begin to parse the meaning of an algorithm’s running time, you quickly 
come to a realization. Take a simple linear search algorithm on some random (un- 
sorted) data array, myArray[k]. We will plow through the array from element 0 
through element N — 1 and stop if and when we find the search key, x: 


for (k= 0, -téeund: = false 
k<N && !found ; 
k++ 


if ( x = myArray[k] ) 


found = true; 
foundPosition = k; 


} 


if ( found ) 
cout << x <<” found: at. positiom "<< .k<<-end!: 


If x is in location myArray [0] , the algorithm terminates instantly independent of the 
array size: constant time. If it is in the last location, myArray[N-1] (or not in the 
list at all), the algorithm will take N — 1 steps to complete, a time that increases 
linearly with N. 


If we can’t even adjudicate the speed of an algorithm for a single data set, how 
do we categorize it in terms of all data sets of a fixed size? We do so by asking three 
of four types of more nuanced questions. The most important category of question is 
“what happens in the worst case?” This kind of time complexity is called big-O. 


To measure big-O time complexity, we stack the cards against ourselves by con- 
structing the worst possible data set that our algorithm could encounter. In the 
search example above, that would be the case in which x was in that last position 
searched. This clears up the ambiguity about where we might find it and tells us that 
the big-O complexity is going to be linear. But wait — we haven't officially defined 
what it means to be linear. 


445 


15.3.2 Definition of Big-0 


Let’s say we have an ordinary function of N, call it f(V). f(.NV) could be anything. 
One example is 


f(N) = N?+3N+75. 
Another might be 
f(N) = NlogN + 2. 


We wish to compare the growth rate of Ta(V) with the function f(V). We say that 


Tg grows no faster than f(NV). 


We are giving our timing on algorithm Q an upper bound using the function f (NV). 
In words, we say that “the timing for algorithm Q is big-O of f”. But we still have a 
loose end; the wording ” grows no faster than” is not very rigorous, so let’s clear that 
up. 


there exist positive constants no and c such that 


Ta(N) < clf(N)|, forall N > no. 


This means that while Tg(V) might start out being much greater than c|f(NV)| for 
small N, eventually it “improves” as N increases to the extent that Ta(N) will become 
and stay < c|f(N)| for all N once we get past N = no. 


Note. Since our comparison function, f(x), will always be non-negative, I will 
drop the absolute value signs in many of the descriptions going forward. 


15.3.3. Common Terminology for Certain Big-O Growth Rates 


Now we can officially define the terms like quadratic or exponential growth. 


446 


Descriptions of Growth Functions 


f(N) Informal term for O(f(V)) 
1 constant 
log N logarithmic 
log? N log-squared 
N linear 
N log N (no special term) 
N? quadratic 
ae cubic 
NE 
: polynomial 


k; > 0, integer 


N¥ log N', 
k > 0, integer (also) polynomial 
2" exponential 


N! factorial 


15.3.4 Factors and Terms We Can Ignore 

You'll note that the table of common big-O terminology doesn’t contain functions 
like 10N? or N* +N. There’s a good reason for this. 

Ignore a Constant Factor K 


Instead of declaring an algorithm to be O(1000N), we will say it is O(N) (i.e., linear). 
Instead of O (1.5.N%) we will call it O (N°) (ie., cubic). Here’s why. 


Theorem. [fT 9(N) = O(Kf(N)), for some constant K, thenTg(N) = 
O(F(N)). 


The theorem says we can ignore constant factors and use the simplest version of the 
function possible, i.e., O(N?) vs. O(3.5N?). 


[Exercise. Prove it. Hint. It’s easy.| 


447 


For Polynomial big-O complexity, Ignore all but the Highest Power Term 


We only need monomials, never binomials or beyond, when declaring a big-O. For 
example if Tg is O(N* + N? + N +1) it’s more concisely O(N“). Here’s why. 


Lemma. If 7 and k are are non-negative integers satisfying 3 < k, then 
Ni < N*, for all N > 1. 


[Exercise. Prove it. Hint. It’s easy.| 


Now we prove the main claim of the section: for big — O, we can ignore all but 
the highest power term in a polynomial. 


Theorem. If 


then 


Proof. We have positive constants ng and c such that 


k 

Ni 
y a;N 
j=0 


Pick a positive number a greater than all the coefficients a;, which will allow us to 
write 


POUN YS He , forall N > mn. 


Cc 


k 

Nd 
y a,;N 
j=0 


Next, if the no happened to be = 0 in the bzg-O criteria, “upgrade” it to 1. Now no 
is > 1 so we can apply the lemma and replace all the N’ with the higher power, N*, 


k k 
ca S$” NI a ca > N* = ca(k+1)N*, forall N > no, 
j=0 j=0 
and we have a new constant, c’ = ca(k +1), with 


To(N) < c’|N* 


, forall N > no , 


making Tg big-O of N*. QED 


448 


15.4 () Growth 


15.4.1 Definition of 


Sometimes we want the opposite relationship between algorithm Q, and a function f. 
We want to demonstrate that Q has worse (or at least no better) performance than 
f when the data set on which it operates grows. We say that 
Ta(N) = Q(FW)) 
== 
Tg grows at least as fast as f(NV). 
We are giving our timing on algorithm Q a lower bound using the function f(NV). In 
words, we say that “the timing for algorithm Q is omega of f”. Quantitatively, 
TaQ(N) = Q(fW)) 
==> 


there exist positive constants no and c such that 
Ta(N) > clf(N)|, forall N > no. 


This means that while Tg(V) might start out being much smaller than c|f(V)| for 
small N, eventually it will “degrade” as N increases to the extent that Tg(V) will 
become and stay > c|f(N)| for all N once we get to N = no. 


There are similar theorems as those we proved for big-O complexity that would 
apply to Q time complexity, but they are not critical for our needs, so I’ll leave it as 
an exercise. 


[Exercise. State and prove lemmas and theorems analogous to the ones we proved 
for big-O, but applicable to 2 growth.] 


15.5 © Growth 


Next, we express the fact that an algorithm Q is said to grow at exactly (a term 
which is not universally used because it could be misinterpreted) the same rate as 
some mathematical expression f(V) using the notation 


TQ(N) = OCF). 


We mean that Taq grows neither faster nor slower than f(N). In words, we say ” the 
timing for algorithm Q is theta f(V)”. Officially, 


TQ(N) = O(F(N)) 


Ideally, this is what we want to know about an algorithm. Sometimes, when pro- 
grammers informally say an algorithm is big-O of N or N log N, they really mean 
that it is theta of N or log N, because they have actually narrowed down the growth 
rate to being precisely linear or logarithmic. Conversely, if a programmer says an al- 
gorithm is linear or logarithmic or N log N, we don’t know what they mean without 
qualification by one of categories, usually big-O or O. 


15.6 Little-o Growth 


Less frequently, you may come across little-Oh notation, that is, Tg = o( f(N)). This 
simply means that we not only have an upper bound in f(V), but this upper bound 
is, in some sense, too high. One way to say this is that 


TQ(N) = o(f(N)) 
—= 
TQ(N) = O(F(N)), but Te(N) # O(f(N)),- 


15.7 Easy vs. Hard 


Computational theorists use the term easy to refer to a problem whose (known) 
algorithm has polynomial time complexity. We don’t necessarily bother specifying 
the exact power of the bounding monomial; usually the powers are on the order of 
five or less. However in this course, we’ll prove exactly what they are when we need 
to. 


Hard problems are ones whose only known algorithms have exponential time com- 
plexity. 

A large part of the promise of quantum computing is that it can use quantum 
parallelism and entanglement to take problems that are classically hard and find 
quantum algorithms that are easy. This is called exponential speed-up (sometimes 
qualified with the terms relative or absolute). 


For the remainder of the course we will only tackle two remaining algorithms, but 
they will be whoppers. They both exhibit exponential speed up. 


15.8 Wrap-Up 


This section was a necessarily brief and incomplete study of time complexity because 
we only needed the most fundamental aspects of big-O and © growth for the most 
obvious and easy-to-state classes: exponential growth, polynomial growth (and a few 
cases of Nlog N growth). When we need them, the analysis we do should be self- 
explanatory, especially with this short section of definitions on which you can fall 
back. 


450 


Chapter 16 


Computational Basis States and 
Modular Arithmetic 


15@3)o <> 412)4 4 |1100) « (1, 1,0, 0) 


16.1 Different Notations Used in Quantum Com- 
puting 


In the quantum computing literature you'll encounter alternative ways to express the 
states of a qubit or the inner workings of an algorithm. If you’re schooled only in 
one style, you might be puzzled when you come across unfamiliar verbiage, especially 
when the author changes dialects in mid-utterance. Today, I want to consolidate a few 
different ways of talking about Hilbert space and computational basis states. This will 
prepare you for the algorithms ahead and enable you to read more advanced papers 
which assume the reader can make such transitions seamlessly. It’s also convenient to 
have this single resource to consult if you’re reading a derivation and suddenly have 
a queasy feeling as the notation starts to get away from you. 


16.2 Notation and Equivalence of Three Environ- 
ments 

16.2.1 First Environment — n-qubit Hilbert Space, Hin) 

Single Qubit Hilbert Spaces 


Recall that a one-qubit Hilbert space, H, consists of the 2-D complex vector space with 
basis vectors |0) and |1), making its typical state a superposition (always normalized, 


451 


of course) of those two vectors, as in 


IY) = a0) + BI). 


Multi Qubit Hilbert Spaces 


Multi-qubit computers operate in a tensor product of such spaces, one for each qubit. 
This product is a 2”-dimensional Hilbert space, which I sometimes label with the 
subscript (n) for clarity, as in H(,). We can think of it as 


n 


n—-1 
—— 
Hn) = H®RH®...@H = &)H. 
k=0 


The computational basis states, or CBS, of this product space are the 2” vectors of 
the form 


n—-1 
Jz)" = [tn1) ® karma) @ [tm-s) @-+- Blo) = lex), 
k=0 


where each |z;,) is either |0) or |1), ie., a CBS of the kth one qubit space. We index in 
decreasing order from x,_1 to %) because we'll want the right-most bit to correspond 
to the least significant bit of the binary number 2,_1...2%12%o. 
Different CBS Notations 
One shorthand we have used in the past for this CBS is 

|Zn—1) |@n—2) +++ |@1) |zo) , 


and two other common notations we’ll need are the decimal integer (encoded) form, 
x, and its binary representation, 


eye. re tO. 2 Bons DPT. or 


\fn—4 Tn—-2Q -.. U3XQ 04 Zo) ’ Lp E {0, 1} ° 


For example, for n = 3, 


0)? <» 000), 

1)? < + |001), 

2)° ¢-+ |010), 

BP. Se OIDs 

4)> «+ |100), 
and, in general, 

a)° <— |XX 1X0). 


452 


The 2-dimensional = 11) and its 2”-dimensional products H(,) are models we 
use for quantum computing. However, the problems that arise naturally in math and 
computer science are based on simpler number systems. Let’s have a look at two such 
systems and show that they are equivalent to the CBS of our Hilbert space(s), H. 
16.2.2 The Second Environment: The Finite Group (Z2)” 
Simple Integers 
We’re all familiar with the integers, Z, 


Z = {..., -38, -2, -1, 0, 1, 2, 3,...}, usual +, 


where I have explicitly stated the operation of interest, namely ordinary addition. 
(We’re not particularly interested in multiplication at this time.) This is sometimes 
called the group of integers, and as a set we know it’s infinite, stretching toward --oo 
in the two directions. 


Stepping Stone: Zjx, or mod N Arithmetic 


Another group that you may not have encountered in a prior course is the finite group 
Zn, consisting of only the N integers from 0 to N — 1, 


Zy = {0,1,2,...N-—1}, “+” is (4 mod WN), 


and this time we are using addition modulo N as the principal operation; ifx+y > N, 
we bring it back into the set by taking its remainder after dividing by N: 


x+y (mod N) = («#+y)%N. 
The negative of each x € Zy is defined by 
—x (mod N) = (N —- 2). 
Subtraction is defined using the above two definitions, as you would expect, 


x—y (mod N) = («+ -y) %N. 


453 


To make this concrete, we consider the group Zj5. 


Example Operations mod-15 


i re) 
eal ee. 2 
144 14 = 13 
—-l = 14 
fT = 8 
10 = -5 
14-2 = 12 
2-8 = 9 
4—A4 0 


Stepping Stone: Zz with @ Arithmetic 


While Zy is important in its own right (and we’ll be using it in the future), an 
important special case you'll want to embrace is Zy for N = 2. The above definition 
of Zy works for Z2, but when N = 2 some special notation kicks in: 


Zy = {0,1}, “®” is (+ mod 2). 


A few consequences of mod-2 addition are 


Ort oi al 
Le OF 5 OL 
000 = 0 
Dee a eS 60 
ee ae oe | 
0-1 =001 = 1 


Of course © is nothing other than the familiar XOR operation, although in this 
context, we get subtraction and negative mod-2 numbers defined, as well. Also, while 
we should be consistent and call subtraction ©, the last example shows that we can, 
and usually do, use the ordinary “—” operator, even in mod-2 arithmetic. If there is 
the potential for confusion, we would tag on the parenthetical “ (mod 2).” 


An Old Friend with a New Name 


We studied Zp) under a different name during our introductory lecture on a single 
qubit (although you may have skipped that optional section; it was a study of classical 
bits and classical gates using the formal language of vector spaces in preparation for 
defining the qubit). At the time we didn’t use the symbolism Z, for the two-element 
group, but called it B, the two-element “field” on which we built the vector space B 
for the formal definition of classical bits and operators. 


454 


Connection Between Zz and H. 


While not officially recognized by any government or municipality, it will help to 
observe a very simple relationship between the mod-2 group, Ze, and the single-qubit 
space, H = Hi). It’s little more than the fact that the dimension of H 1), 2, equals 
the size of the group Zz, also 2. Moreover, each CBS of Hi) is labeled using the 
digits 0 and 1: 


Ly Hay 
0 «+ |0) 
1 © |i) 


I hasten to add that this connection does not go beyond the 1:1 correspondence listed 
above and, in particular, does not extend to the mod-2 addition in Zz vs. the vector 
addition in H; those are totally separate and possess no similarities. Also, only the 
basis states in H are part of this correspondence; the general state, |q) = a |0)+ 6 |1) 
has no place in the analogy. As tenuous as it may seem, this connection will help us 
in the up-coming analysis. 


Second Environment Completed: (Z2)” with @ Arithmetic 


As a set, (Z2)" is simply the n-tuples that have either 0 or 1 as their coordinates, 
that is, 


(Zz)” = { (Geis Tn—-2, --- U2, V1, Xo ) i 


each x, = 0 or 1, 


or, in column vector notation, 


OS: 
lem a 
oO oO 
mee 


(Zy)” = : ’ : ’ : ’ 


Ove 
re © 
Or 
mee 


Notice that we label the Oth coordinate on the far right, or bottom, and the (n — 1)st 
coordinate on the far left, or top. This facilitates the association of these vectors with 
binary number representations (coming soon) in which the LSB is on the right, and 
the MSB is on the left (as in binary 1000 = 8, while 0001 = 1). 


455 


The additive operation stems from the “Z ” in its name: it’s the component-wise 
mod-2 addition, © or, equivalently XOR, e.g., 


QD 


Ke OF © 


1 
0 
1 
1 


OrRrRrR 


I'll usually write the elements of (Z2)" in boldface, as in x or y, to emphasize the 
vector point-of-view. 


Common Notation. This set is often written {0, 1 }”, especially when we don’t 
care about the addition operation, only n-tuples of 0s and 1s that are used as inputs 
to Boolean functions. 


(Z2)” is a Vector Space 


I’ve already started calling the objects in (Z2)” vectors, and this truly is an official 
designation. Just as R” is an n-dimensional vector space over the reals, and Hn) is a 
2”-dimensional vector space over the complex numbers, (Z2)" is a vector space over 
Zy. The natural question arises, what does this even mean? 


You know certain things about all vector spaces, a few of which are 


e There is some kind of scalar multiplication, cv. 
e There is always some basis and therefore a dimension. 


e There is often an inner product defining orthogonality. 


All this is true of (Z2)", although the details will be defined as they crop up. For now, 
we only care about the vector notation and @ addition. That is, unless you want to 
do this ... 


[Exercise. Describe all of the above and anything else that needs to be confirmed 
to authorize us to call (Z2)” a vector space.| 


Caution. For general N, (Zy)" is not a vector space. The enlightened among 
you can help the uninitiated understand this in your forums, but it is not something 
we will need. What kind of N will lead to a vector space? 


Recall. Once again, think back to the vector space that we called B = B?. It was 
the formal structure that we used to define classical bits. Using the more conventional 
language of this lesson, B is the four-element, 2-dimensional vector space (Z2)°. 


Connection Between (Z2)” and Hn) 


Let’s punch up our previous analogy, bringing it into higher dimensions. We relate 
the n-component vectors, (Z2)", and the multi-qubit space, H = Hi). As before, 


456 


the dimension of H(,), 2”, equals the size of the group (Z2)", also 2”. The vectors in 
(Z2)" corresponds nicely to the CBS in H(,): 


(Z2)" Hn) 
(0: 0ysa35 00:0): + |00 --- 000) 
(2 Osea, OO: ay + |00 --- 001) 
(0, 0, ..., 0, 1, 0)* + |00 --- 010) 
CO yex aero 0 Pa le + |00 --- 011) 
(O06 ea, 00)" + |00 --- 100) 
(Zn-1, «++; £2, Li, Lo i + |fp—1 +++ £2 Li Lo) 


Again, there is no connection between the respective addition operations, and the 
correspondence does not include superposition states of Hin). Still, the basis states in 
Hn) line up with the vectors in (Z2)", and that’s the important thing to remember. 


16.2.3. The Third Environment: The Finite Group Z2” with 
® Arithmetic 


One of our stepping stones above included the finite group Zy with mod-N + as its 
additive operation, 


Tie =, LON Deen eae 


Now we’re going to make two modifications to this. First, we’ll restrict the size, N, 
to powers of 2, i.e., N = 2”, for some n, 


Ds = LOA aa), 


so each x in Zy has exactly n binary digits. 


LG > W171 Un-2 +++ LX, Xp, notationally, or 
n-1 

Res = y r,2", as a sum of powers-of-2. 
k=0 


The second change will be to the addition operation. It will be neither normal addition 
nor mod-N addition. Instead, we define x + y using the bit-wise ® operator. 


ie SS ye os ae es 
and: a = Gp Yaae 92 Ye a Uos 
then ct@y = (&n-1@PYn-1) +++ (Lo OB yo)(@1  y1) (Lo @ Yo) - 


457 


Note that the RHS of the last line is not a product, but the binary representation 
using its base-2 digits (e.g., 11010001101 ). Another way to say it is 


n-1 


cey = SY (cr @yx)2*. 
k=0 


To eliminate any confusion between this group and the same set under ordinary mod- 
N (mod-2") addition, let’s call ours by its full name, 


(Zon, ®) . 


Examples of addition in (Zgn, @) are 


1@l1l = JQ, 
1eo. = 3 
165 = 4, 
465 = 1, 
9611 = 14, 
134613 = 0, and 
15@3 12 
Note that for any x € (Zyn, ®), 
tox = 0). -so 
c= —4d, 


i.e., x is its own additive inverse, under bit-wise ©. 


Connection Between (Z2)” and (Z2”,@) 


(Zon, ®) and (Z»)" are fundamentally two ways to express the same group — they are 
isomorphic in group terminology. This is symbolized by 


(Zon, ®) = (Zy)" ’ 
under the set and operator association 
In-1 
Tn-2 
L = ({p-1@p-2°+°%1Xp) <> X= : : 
Ly 
Zo 
zcOoy -> xoOy. 
[Exercise. For those of you fixating on the vector space aspect of (Zz)", you may 


as well satisfy your curiosity by writing down why this makes Zon a vector space over 
Zy, one that is isomorphic to (effectively the same as) (Z2)".| 


458 


16.2.4 Interchangeable Notation of H(,), (Z2)” and (Z2”, ®) 


In practical terms, the relationship between the above three environments allows us 
to use bit-vectors, 


(1,1,0,1,0) = 


OrRorr 


binary number strings, 
11010, 
or plain old “encoded” ints 
26 


interchangeably, at will. One way we’ll take advantage of this is by using plain int 
notation in our kets. For n = 5, for example, we might write any of the four equivalent 
expressions, 
|26) 
|11010) 
[1) |1) |0) |1) |0) 
[1) ® [1) ® |0) @ |1) @ [0), 
usually the first or second. Also, we may add notation to designate the number of 
qubits under consideration, as in 
|26)” or, to show one possible breakdown, the equivalent 
[3)* @ |2)°. 


Hazard 
Why are these last two equivalent? We must be careful not to confuse the encoded 


CBS notation as if it gave coordinates of CBS kets in the natural basis — it does not. 
In other words, |3)” expressed in natural tensor coordinates is not 


(i) 


2 a 5 = a 
a) |3)° is a 4-dimensional vector and requires four, not two, coordinates to express 
it, and 


It can’t be for two reasons: 


459 


b) |3)? is a CBS, and any CBS ket expressed in its own (natural) basis can have 
only a single 1 coordinate, the balance of the column consisting of 0 coordinates. 


: 2: : : 
The correct expression of |3)~ in tensor coordinates is 


as can be seen if we construct it from its component tensor coordinates using 


ot = wom - (oC) - [ 
1 


Therefore, to answer this last question “why are the last two expressions equivalent?” 
we must first express all vectors in terms of natural coordinates. That would produce 
three column vectors (for |3)”, |2)* and |26)°) in which all had a single 1 in its 
respective column, and only then could we compute the product of two of them, 
demonstrating that it was equal to the third. 


To head off another possible source of confusion, we must understand why the 
tensor product dimension of the two component vectors is not 2 x 3 = 6 contradicting 
a possibly (and incorrectly) hoped-for result of 5. Well, the dimensions of these spaces 
are not 2, 3, 5 or even 6. Remember that “exponent” to the upper-right of the ket 
designates the order of the Hilbert space. Meanwhile, the dimension of each space 
is (2)°"4", so these dimensions are actually 2?, 2? and 2°. Now we can see that the 
product space dimension, 32, equals the product of the two component dimensions, 
4 x 8, as it should. 


Sums inside Kets 


Most importantly, if z and y are two elements in Zon, we may take their mod-2 sum 
inside a ket, 


lz By), 


which means that we are first forming x © y, as defined in (Zyn, @), and then using 
that n-bit answer to signify the CBS associated with it, e.g., 


[L65) = 4), 
5@11) = |14) or, designating a qubit size, 
\15@3)* = |12)* and 
|j21@21)° = |o)°. 


460 


Example of Different Notations Applied to Hadamard 


Recall that the nth order Hadamard operator’s definition, usually given in encoded 
binary form, is 


He |x" = (=) yrs a”, 


where © is the mod-2 dot product based on the individual binary digits in the base-2 
representation of x and y, 


LOY = Ln-1Yn-1 DBD In-2Yn-2 D-* D iyi DB Fo Yyo- 


If « and y are represented as vectors in (Z)" using the boldface x and y, 


Tn-1 Yn-1 
Ln—2 Yn-2 
Ly Y1 
Zo Yo 


the dot product between vector x and vector y is also assumed to be the mod-2 dot 
product, 


X*Y = @y-1Yn-1 DB Ln-2Yn-2 Bes OB MY DB Loy, 


giving the alternate form of the Hadamard gate, 


wom = (4) Scary yr 


y=0 


This demonstrates the carefree change of notation often uncounted in this and other 
quantum computing presentations. 


We have completed our review and study of the different notation and language 
used for CBS. As we move forward, we’ll want to add more vocabulary that straddles 
these three mathematical systems, most notably, periodicity, but that’s best deferred 
until we get to the algorithms which require it. 


461 


Chapter 17 


Quantum Oracles 


jo)" ee Hee A 
Us 


"SS 
Quantum Oracle 


17.1 Higher Dimensional Oracles and their Time 
Complexity 


We’ve seen oracles in our circuits for Deutsch, Deutsh-Jozsa and Bernstein- Vazirani, 
but today we will focus on the oracle itself, not a specific algorithm. Our goals will 
be to 


e extend our input size to cover any dimension for each of the two oracle’s chan- 
nels, 


e get a visual classification of the matrix for an oracle, and 


e define relativized and absolute time complexity, two different ways of measuring 
a quantum algorithm’s improvement over a classical algorithm. 


The last item relies on an understanding of the oracle’s time complexity, which is why 
it is included in this chapter. 


We'll continue to use “Uy” to represent the oracle for a Boolean function, f. Even 
as we widen the input channels today, Uy will still have two general inputs, an upper, 
A or data register and a lower, B or target register. 


462 


At the top of the page I’ve included a circuit that solves Simon’s problem (coming 
soon). It reminds us how the oracle relates to the surrounding gates and contains a 
wider (n qubit) input to the target than we’ve seen up to now. 


17.2 Simplest Oracle: a Boolean Function of One 
Bit 


At the heart of our previous quantum circuits has lurked the unitary transformation 
we've been calling the quantum oracle or just the oracle. Other gates may be wired 
into the circuit around the oracle, but they are usually standard “parts” that we pull 
off the shelf like CNOT, Hadamard and other simple gates. The oracle on the other 
hand is custom designed for the problem to be solved. It typically involves some func- 
tion, f, that the circuit and its algorithm are meant to explore/discover/categorize. 


17.2.1 Circuit and Initial Remarks 


The simplest quantum oracle is one that arises from a unary Boolean f. We defined 
such a U; in the Deutsch circuit. Its action on a general CBS |) |y) is shown in the 
following circuit: 


|x) |x) 
Us 


oD) ly ® f(x) 


In terms of the effect that the oracle has on the CBS |x) |y), which we know to be 
shorthand for |”) ® |y), the oracle can be described as 


Iz) ly) 5 |x) ly @ f(a). 


There are a number of things to establish at the start, some review, others new. 


Initial Remarks (Possibly Review) 


1. @ is the mod-2 addition: 


0860 0 
(reek 
Leo 1 
161: = 0 


2. This gate can be viewed as a reversible operator for a typically irreversible 
boolean function, f. 


463 


3. The separable state |x) |y) coming in from the left represents a very special and 
restricted input: a CBS. Whenever any gate is defined in terms of CBS, we 
must remember to use linearity and extend the definition to the entire Hilbert 
space. We do this by expanding a general ket, such as a 4-dimensional Ib)”, 
along the CBS, 


|X)? = em |0) 0) + ey JO) (1) + e2|1) |0) + e3 |) [1) 
Go ION seg 18S A eg BY 3.18)? 


reading off the output for each of the CBS kets from our oracle description, and 
combining those using the complex amplitudes, cx. 


4. When a CBS is presented to the input of an oracle like Uy, the output happens 
to be a separable state (something not true for general unitary gates as we saw 
with the BELL operator). In this case, the separable output is |) |y @ f(«)). 
Considering the last bullet, we can’t expect such a nice separable product when 
we present the oracle with some non-basis state, wy, at its inputs. Take care 
not to make the mistake of using the above template directly on non-CBS inputs. 


Initial Remarks (Probably New) 


Oracles are often called black boxes, because we computer scientists don’t care how 
the physicists and engineers build them or what’s inside. However, when we specify 
an oracle using any definition (above being only one such example), we have to check 
that certain criteria are met. 


1. The definition of U;y’s action on the CBS inputs as described above must result 
in unitarity. Should you come across a putative oracle with a slightly off-beat 
definition, a quick check of unitarity might be in order — a so called “sanity 
check.” 


2. The above circuit is for two one-qubit inputs (that’s a total of 4-dimensions for 
our input and output states) based on a function, f, that has one bit in and 
one bit out. After studying this easy case, we'll have to extend the definition 
and confirm unitarity for 


e multi-qubit input registers taking CBS of the form |x)” and |y)”, and 
e an f with domain and range larger than the set {0, 1}. 
3. The function that we specify needs to be easy to compute in the complexity 
sense. A quantum circuit won’t likely help us if a computationally hard function 
is inside an oracle. While we may be solving hard problems, we need to find 


easy functions on which to build our circuits. This means the functions have to 
be computable in polynomial time. 


4. Even if the function is easy, the quantum oracle still may be impractical to build 
in the near future of quantum computing. 


464 


17.2.2 A Two-Qubit Oracle’s Action on the CBS 


The nice thing about studying single bit functions, f, and their two-qubit oracles, U,, 
is that we don’t have to work in abstracts. There are so few options, we can compute 
each one according to the definitions. The results often reveal patterns that will hold 
in the more general cases. 


Notation Reminder — If a is a single binary digit, @ = —a is its logical negation 
(AKA the _ bit-flip or not operation), 


_ 0, ifa=1, 
a@(orna) = i, ops 


A short scribble should convince you that 0@a=a and 1@a=da@. Therefore, 


Us 


|Z) |0) +> |x) |[0@ f(z)) = |x) |f(z)), and 
jn) 1) —S |x) lL@ f(@)) = |x) IF). 


17.2.3. Case Study #1: f(x) =1 
CBS Table for the Oracle 


If f =1, a constant function, we can list all the possible outputs in a four-line table: 


x) |y) Us (|) |y)) 
0) |0) = |0)° 0) |1) = [1)° 
0) |1) = [1)° 0) |0) = |0)° 
110). 2)" 1) |1) = |3) 
1) |1) = |3)° 1) 0) = |2)° 


(Remember, | )* does not mean “ket squared”, but is an indicator that this is an 
n-fold tensor product state, where n = 2.) 


The Matrix for Us 


It’s always helpful to write down the matrix for any linear operator. At the very least 
it will usually reveal whether or not the operator is unitary — even though we suspect 
without looking at the matrix that Uy is unitary since it is real and is its own inverse. 
However, self-invertibility does not always unitarity make, so it’s safest to confirm 
unitarity by looking at the matrix. 


This is a transformation from 4-dimensions to 4-dimensions, so we need to express 
the 4-dimensional basis kets as coordinates. Let’s review the connection between the 
four CBS kets of Hi) and their natural basis coordinates: 


465 


component 0) |0) |0) |1) |1) |0) 1) |1) 
encoded |0)? 1)” |2)? |3)? 
1 0 0 0 
; 0 1 0 0 
coordinate 
0 0 1 0 
0 0 0 1 


To obtain the matrix, express each U; (|x) |y)) as a column vector for each tensor 
CBS, |k)?, k=0,..., 3: 


(Us |0)" Uy |1)° Us |2)’ Uy |3)") 
(|1)° 0)” [3)°  [2)°) 


Oe Te, Oe -0 
_ ir 0 0! 0 
7 0 0 0 1 

OOF a, 20 


Aha. These rows (or columns) are clearly orthonormal, so the matrix is unitary 
meaning the operator is unitary. 


It will be useful to express this matrix in terms of the 2 x 2 Pauli matrix, o;, 
associated with the X gate. 


This form reveals a pattern that you may find useful going forward. Whenever a 
square matrix M can be broken down into smaller unitary matrices along its diagonal 
(Os assumed elsewhere), M will be unitary. 


[Exercise. Prove the last statement. ] 


17.2.4 Case Study #2: f(x) =a 
CBS Table for the Oracle 


Now, let f = x, the identity function. We repeat the process of the first case study 
and list all the possible outputs. For reference, here are the key mappings again: 


|x) |0) 
Us —_~ 


|x) |f(x)) . 


466 


Now, the table for f = x: 


x) |y) Us (|x) |y)) 
0) |o) = jo)? 0) jo) = jo) 
0) |1) = |1) 0) |1) = |1)? 
1) |0) = |2)° 1) |1) = 3)? 
13) 1) |0) = |2)” 


The Matrix for Us 
We obtain the matrix by expressing each Uy (|x) |y)) as a column vectors: 
(Us |0)” Up |1)° Us |2)’ Uy |3)*) 


= (0)* |1)° |3)° [2)*) 


oo Oo F 
coor & 
SH COO O 
orooe; 


1 
| oe 
This time, we’ve enlisted the help of the 2 x 2 identity matrix, 1. Again, we see how 


Uy acts on its inputs while confirming that it 7s unitary. 


17.2.5 Remaining Cases #3 and #4 


There are only two more 1-bit functions left to analyze. One of them, f(x) = 0 was 
covered in our first quantum algorithms lesson under the topic of Oracle for the {0]-op. 
We found its matrix to be 


eGR) 


The last, f(x) = Z, has the matrix 


[Exercise. Derive the last matrix.] 


467 


17.3 Integers Mod-2 Review 


17.3.1 The Classical f at the Heart of an Oracle 


We will be building oracles based on classical Boolean functions that might have many 
inputs and one output, 


f:{0,1}" > {0,1}. 
or many inputs and many outputs, 
f:{0,1}" > {0,1}”. 


To facilitate this, we’ll need more compact language than {0, 1 }", especially as we 
will be adding these multi-bit items both inside and outside our kets. We have 
such vocabulary in our quiver thanks to the previous lesson on C'BS and modular 
arithmetic. 


17.3.2 Mod-2 Notation for {0, 1}" 


We learned that {0, 1}” can be thought of in one of three equivalent ways. 


Encoded Integer Notation: (Zon, @) 
This is the group containing the first 2” integers 
Zign = {0, 1 2seeec 2S} 


with mod-2 arithmetic, 6. There is always the possibility of confusion when we 
look at a set like Zy, for some integer N > 0. Usually, it means the numbers 
{0, 1, 2, ...,N—1, } with the mod-N addition operation. However, when N = 2” 
is a power-of-2, it can often mean — and, for us, does — that we are using a bitwise 
mod-2 addition, @. For this lesson, P’ll use Zon rather than the more cumbersome 
(Zon, ) when I want to signify these “encoded” integers with mod-2 addition. 


Binary Vector Notation: (Z2)” 


Here we mean the group consisting of the 2” binary vectors, 


0 0 0 1 
Zo)” = ’ ’ oy Sot! 5 ; 
(22) o}’fo]’}1 1 

0 1 0 1 


with 6 being component-wise mod-2 addition. 


468 


The Natural Basis for Hin) 
Finally, we have the 2” basis kets for Hn), 


CBe tor Aig: SS Yep a: ee), 1 2, 8 fteg OP Sep 


Seen below, we can use either encoded or binary form to write these basis kets. 


n 


OV? =e" 1024000), 
1 <= "|0ss2001)3 
4 0010 
a es. lOO, 
Ay gas “Iee100)% 
IS: eee). 


17.3.3. @ Inside the Ket 


This review might be more than we need, but it will nip a few potentially confusing 
situations in the bud, the biggest being the notation 


ly ® f(x) 


when y and f(x) are more than just labels for the CBS of the 2-dimensional H, 0 and 
1. Of course, when they are 0 and 1, we know what to do: 

\1@0) = |1) or 

lel) = |oy. 


But when we are in a higher dimensional Hilbert space that comes about by studying 
a function sending Zgn into Zam, we'll remember the above correspondence. For 
example, 


|12)* and 
\0)”. 


[15 @ 3)" 
|21 @ 21)° 


With that little review, we can analyze our intermediate and advanced multi-qubit 
oracles. 


469 


17.4 Intermediate Oracle: f is a Boolean Function 
of a Multiple Input Bits 


17.4.1 Circuit 
A Wider Data Register 


Next, we study the circuit specification 


Jn)" Jn)" 

Us 
ly) ly ® f(x) 

which arises when the function under study is an f which maps {0, 1 }" > {0, 1}. 


We came across this oracle in Deutsh-Jozsa and Bernstein- Vazirani. We'll review and 
dig deeper. 


Such an f would require an n-qubit |)” input be sent to the A register, even as we 
maintained the single-qubit |y) going into the register B. This should be intuitively 
clear because the oracle’s very definition calls for the output of the B register to be 
ly ® f(a)), something that can only be accomplished if both 


i) x has the right number of bits to satiate f (namely, n), and 


ii) the y to be “@-ed” with f(x) is commensurate with the output value of f 
(namely, a single binary bit). 


The analysis of the simplest oracle taught us that Uy was expressible as a unitary 
4x 4 matrix with cute, little unitary 2 x 2 matrices (either 1 or o,,) along its diagonal. 
It will turn out this intermediate-level oracle is nothing more than many copies of 
those same 2 x 2s affixed to a longer diagonal, one traversing a 2”*+ x 2"+! matrix. 


17.4.2 An (n+ 1)-Qubit Oracle’s Action on the CBS 


The general mapping we found so useful in the simple case continues to work for us 
here. Even though x is now in the larger domain, Zan, f(x) is still restricted to 0 
or 1, so we can reuse the familiar identities with a minor change (the first separable 
component gets an order n exponent), 


Iz)” |0) > |x)" |F(a)), 
le)" |1) Ss |x)" |F@)) 


470 


17.4.3 Analyzing Us for x = 0 


For the moment, we won’t commit to a specific f, but we will restrict our attention 
to the input x = 0. 


f (0) can be either 0 or 1, and y can be either 0 or 1, giving four possible combinations: 


0)" |y) f(0) U; (|0)" |y)) 
0)" 0) = |oy"** 0 0)" |f(0)) = |0)"|0) = |o)"* 
0)" 1) = [ay 0 0)" |f()) = 0)" |1) = [1)"* 
0)" 0) = |oy"** 1 0)" |F(0)) = |o)"|1) = |1)"* 
oy" |1) = |r" 1 0)" |F(0)) = |0)"|0) = |o)"* 


This is a little different from our previous table. Rather than completely determining 
a4x 4 matrix for a particular f, it gives us two possible 2 x 2 sub-matrices depending 
on the value of f(0). 


1. When f(0) = 0, the first two columns of the matrix for Uy will be (see upper 
two rows of table): 


( U; |0)"" U,|be" +++) 


= (lo jy +) 


6 
01 
mae eee 
0 0 

1 ? 


2. When f(0) = 1, the first two columns of the matrix for Uy will be (see lower 


A71 


two rows of table): 
(pO ple 222) 


= (Ww jor --) 


0 1 
1 0 
=. olin 05 10 i, 
0 0 
Os ? 


The “?” represents the yet-to-be-studied portion of Us. 


So we see that regardless of the value f(0), the upper-left 2 x 2 sub-matrix will be 
one of our two familiar unitary matrices, 1 or a,. There was nothing special about 
x = 0. We might have analyzed, x = 1, 2, 3, or any x up to 2” — 1. In fact, we’ll do 
that next. 


17.4.4 Analyzing Uy for General x 


We continue studying a non-specific f and, like the x = 0 case, work with a fixed x. 
However, this time 7 € Zan can be any element in that domain. If it helps, you can 
think of a small x, like x = 1, 2 or 3. The mappings that will drive the math become 


Iz)” |0) — |x)" |F(a)), 
le)" |1) 4 |x)" |F@)). 


As before, there two possible values for f(x) and two possible values for y, giving four 
possible combinations for this x: 


x)" |y) f(z) Us (|x)" |y)) 

x)" |0) 0 x)" |f(x)) = |x)" |0) 
x)" |1) 0 x)"|f(x)) = |x)" |1) 
x)" |0) 1 x)" |f(x)) = |x)" |1) 
x)" |1) 1 x)” |f(x)) = |x)” (0) 


We computed the first two columns of the matrix for Uy before, and now we 
compute two columns of Uy further to the right. 


[In Case You Were Wondering. Why did the single value x = 0 produce two 
columns in the matrix? Because there were two possible values for y, 0 and 1, which 
gave rise to two different basis kets |0)"|0) and |0)"|1). It was those two kets that 
we subjected to Uy to produce the first two columns of the matrix. Same thing here, 
except now the two basis kets that correspond to the fixed x under consideration are 
|v)” |0) and |a)”" 1), and they will produce matrix columns 2x and 2x + 1] 


1. When f(#) = 0, columns 2x and 2x + 1 of the matrix for Uy will be (see upper 
two rows of table): 


(--- Uy (|x)"|0)) Us (\z)"|1)) +++) 
(--- |e)" 10) |x)" [1) ++) 


2x 2x41 


1 O oe 
i 0 1 Qa+1 


22+1 


2. When f(0) = 1, the first two columns of the matrix for Uy will be (see lower 


473 


two rows of table): 


(+++ Uy (|xy"|0)) Us (l)"|1)) ---) 


0 1 ii 
= 224+1 


2 
= (ee ‘ 
2x2+1 


As you can see, for any x the two columns starting with column 2x contain all 0s 
away from the diagonal and are either the 2 x 2 identity 1 or the Pauli matrix o, on 
the diagonal, giving the matrix the overall form 


[1 or oz] 


[1 or oz] 0) 


(0) [1 or oz] 


[1 or oz] 


The oracle is — as we strongly suspected it would be — unitary, and we now have a 


nice visual image of what it looks like. 


ATA 


17.5 Advanced Oracle: f is a Multi-Valued func- 
tion of a Multiple Input Bits 

17.5.1 Circuit and Vocabulary 

A Wider Target Register 


Finally, we’ll need to understand the full circuit specification 


|x)" Iz)" 

Us 
ly)" ly ® f(x)" 

which arises when the function under study is an f that maps Zon — Zgm. In this 


case it only makes sense to have an m-qubit B register, as pictured above, otherwise 
the sum inside the bottom right output, |y @ f(x)) would be ill-defined. 


Sometimes m =n 


Often, for multi-valued f, we can arrange things so that m =n and, in that case the 
circuit will be 


Us 


17.5.2 Smallest Advanced Oracle: m = 2 (Range of f C Z,2) 


In this case, f(z) € {0, 1, 2, 3} for each x € Zon. There will be (n+ 2) qubits going 
into Uy, (n qubits into the A register and 2 qubits into the B register). 


|x)" Iz)" 
Us 
ly)” ly ® f(x)’ 
This gives a total of 2”*? possible input CBS going into the system, making the 
dimension of the overall tensor product space, H(n42), = 2"*?. The matrix for Uy 


will, therefore, have size (2"*? x 2"*?). 


17.5.3. The 4 x 4 Sub-Matrix for a Fixed zx 
We follow the pattern set in the intermediate case and look at a non-specific f but 


work with a fired x € Zon. Because we are assuming m = 2, this leaves four possible 
images for this one x, namely f(x) = 0, 1, 2 or 3. As we'll see, this leads to a 4 x 4 


A475 


sub-matrix compared to the 2 x 2 sub-matrix we got when studying a fixed x in the 
intermediate case. 


The maneuver that we have to apply here, which was not needed before, is the 
application of the @ operator between y and f(x) on a bit-by-bit basis. For m = 2 
(and using the notation f(x), to mean the kth digit of the number f(x)), we get 


ye f(x) = |yyo & F(x) f(x)o)’ 
| (wi ® f(z)1) (Yo B Ff (z)o) i 
lu B f(z)1) | yo f(x)o). 


The second line is the definition of © in Z»2, and the (---)(---) inside the ket is not 
multiplication but the binary expansion of the number yiyo © f(x)i f(x)o. Thus, our 
four combinations of the 2-bit number y with the fixed value f(a) become 


x)" |0) |0) 5 |x)" O@ F(x)r) [0 ® f(e)o) 
x)" 0) [1) + fo)" ]O@ F(e)r) [L @ f(e)o) 
x)" |1) 0) —S |x)" |1@ f(w)r) [0 @ f(w)o), and 
z)"|1)|t) “5 |x) |1 © F(@)r) |L@ F(@)o). 


Applying 0@a = a and 1@a = @to the RHS of these equalities produces the identities 
that will drive the math, 


x)" |0)|0) > |e)" |f(@)1) [f(@)o) ; 
x)" |0)|1) > |e)" |F(@)1) F@)o) 5 
z)"|1)|0) 5 |x)" [F@)h) |f(z)o), and 
x)" |1)[L) —S |x)" F@) F@)o)- 


Of course, for a general f, there are now four possible values for f(x), and we 
have to combine those with the four possible values for y, yielding a whopping 16 
combinations for this fixed x, as the following table reveals. 


476 


oro wrF ee aor owr ore 
or ee SY ele eee. eae | eS oS S| ee, So, 
Re 8 8 8] 8 B&B BB BI] 8 8B 8B 8] Rk 8B BB 8B 
aq 
a I I I I I I | I I I I I I | I I 
Bee ||) oe See fee eer Gomige | Sie Ben ee) ee ee: ce ee 
| ie ER BS OE |e | ER LR | 
mg || bores wilde” ee: | Ue tes ee CRs oS ney, © Bie lees | ead Mies Se Lees 
ba lll Se lees SR neal eas | ae ee el inceeee ee ee. | ee ee | pee 
es 88 (8/8 Feels S&B S SEE 
SB Rm mi Rn BR IH IRR OR TR IR] ROR IR OI 
8 8 8 8] 8 8&8 8B BI] 8&8 8 8 8] 8 8B 8B 8B 
Silo oo ofl an A AH HIN NN ANID HM OM 
wy 
a) a | et oe I | ee ST OD 
ei] 8 8 8 8] 8&8 8&8 8&8 8] 8&8 8&8 8 8] 8B 8B 8B 8 


Building the matrix for Us when m = 2 is accomplished by considering the four 


columns associated with the four possible f(x) values. 


Because of the 16 possible 


sub-matrices of this form, we will just do a couple cases and see if a pattern emerges. 


0, columns 4a through 4x + 3 of the matrix for Uy will be (see 


topmost four rows of table): 


1. When f(z) 


— 
om 
N 
-— 
fork 
is 
ea] 
8 
NS 
> 
—— 
N 
—™ 
NTN 
ec Pe 
-——~e: 
ra EN 
2 & 
—~ 
Bow 
Sp 
=i 
—— 
N 
— 
— 
is 
— 
as 
YS 
>: 
Ws, 


ATT 


which translates to the matrix 


4r — 4¢4+3 
——— 

00 0 0 

£0.20 0 , 
Oh Es tr 40 r 
OO 20 3 
O° O-o.4 a 
000 0 

or, more concisely, 
4a 42+3 
— 


0 


1 4x > 


4z+3 


As you'll see in a moment, it will facilitate the analysis if we break the 4 x 4 
identity matrix into four smaller 2 x 2 matrices, and write this as 


4x — 42+3 
a 
0 
4x 
1 0 
peur’ 
0 1 42+3 
0 


This 4 x 4 matrix that appears in columns 4% — 42+ 3 is unitary by inspection. 


478 


2. Let’s skip to the case f(x) = 3, and use the table to calculate columns 4x 
through 4x + 3 of the matrix for Uy (see bottommost four rows of table): 


(~ Us (\x)"0)") Uy (|x)" [1)*) ; 2 ) 
Uy (|x) |2)?) Us (|x)" [8)?) - 


= (+: |x)"[1)|1) |x)"|1) JO) |x)" J) |1) |x)" |0) JO) ---), 


which translates to the matrix 


4x — 42+3 
i 

0 0 0 0 

00 0 1 ; 
00 1 0 : 
010 0 an 
i 70006 at 
0 00 0 

or, more concisely, 

4x > 424+3 
i 

0 On 4x 

—_ 


This 4 x 4 matrix that appears in columns 4% — 42+ 3 is unitary by inspection. 


A79 


3. Exercise: When f(x) = 1, show that columns 4% — 4x + 3 of Uy are 


4e — 4r4+3 
————N 
Aa 
Ox 0 
oy 
0 Ox Ax-+3 


This 4 x 4 matrix that appears in columns 4% — 42+43 is unitary by inspection. 


4. Exercise: When f(x) = 2, show that columns 4% — 4x + 3 of Uy are 


4en > 4243 


i 


() 


1 Ax 
— 
1 0 4a+3 


( 


This 4 x 4 matrix that appears in columns 4% — 42+ 3 is unitary by inspection. 


Summary of Case m = 2 


We’ve covered all four possible values for f(x) and done so for arbitrary x = 0,...,2"— 
1. What we have discovered is that 


e each x controls four columns in the final matrix; 


e one of four possible 4 x 4 unitary (sub-)matrices is positioned in these four 
columns, with 0s above and below that 4 x 4 matrix; 


e the four possible sub-matrices for f(a) = 0, 1, 2 and 3, respectivly, are 


e since, for each x, its 4 x 4 unitary matrix is in rows 4x — 4x + 3, with zeros 
above and below, it follows (exercise) that there can only be zeros to the left 
and right on those rows. 


Once again, even though we expected U; to be unitary, its matrix can be seen 
to exhibit this property on its own merit, and the only non-zero elements are on 
4 x 4 sub-matrices that lie along the diagonal. This characterizes all oracles for 
f : Zon > Lge. 


17.5.4 Easy Way to “See” Unitarity 


Now that we’ve analyzed a few oracles, we can see why all Uy will be unitary. In fact, 
you might have noticed this even before having studied the examples. Here are the 
key points, and I'll let you fill in the details as an [Exercise]. 


e Every column is a CBS ket since it is the separable product of CBS kets. 
e All CBS kets are normal vectors (all coordinates are 0 except one, which is 1). 


e If two columns were the identical, Us would map two different CBS kets to the 
same CBS ket. 


e Any matrix that maps two different CBS kets to the same CBS ket cannot be 
invertible. 


e Since Uy is its own inverse, it is invertible, so by last two bullets, all columns 
are distinct unit vector. 


e We conclude that distinct columns have their solitary 1 in different positions. 


e The inner product of these columns with themselves is 1 and with other columns 
is 0: the matrix has orthonormal columns. QED 


Of course, we learn more by describing the form of the matrices, so the activities 
of this chapter have value beyond proving unitarity. 


17.5.5 The General Advanced Oracle: Any m (Range of f C 
Zim) 


Extending Conclusions to Larger U; Presents No Difficulties 


This is where we should apply our intuition and extrapolate the m = 2 case to all 
m. Each time m increases by 1, the number of columns in Uy controlled by a single 
input value x doubles. For m = 3, we would have 8 x 8 unitary matrices along the 
diagonal, each built from appropriate combinations of o, and 1. For m = 4, we would 
have unitary 16 x 16 matrices along the diagonal. And when m = n, we would have 


481 


2” x 2" unitary sub-matrices along the diagonal of a really large 2?" x 2?” matrix 
for Uy. Our explicit demonstrations in the m = | and 2 cases can be duplicated to 
any m with no theoretical difficulty. We’d only be dealing with lengthier tables and 
larger sub-matrices. So if the number of output bits of f is m, for m > 2, the results 
are the same: Uy is unitary and it consists of all 0s except near the diagonal where 
sub-matrices are built from combinations of o, and 1. 


The Diagonal Sub-Matrices are Separable Products 


However, if you are interested in one approach to a more rigorous understanding of 
Uy for general m, start here. In the m = 2 case, our four possible matrices comprising 
the 4x 4 sub-matrices along the diagonal are actually tensor products of two matrices: 


1 
= 181 
| 4 
O 
= = 1®o0, 
jo 
| 4 
= 0, ® il 
1 
| o 
= OO; ® Oy 
Ox 


Each component of the tensor product, 1 and o,, were the two possible 2 x 2 matrices 
that appeared along the diagonal of Uy for m = 1. 


The tensor product of unitary matrices is easily shown to be unitary. And one 
could proceed inductively to demonstrate that the Uy for output size m+ 1 consists 
of sub-matrices along the diagonal, each of which is a tensor products of the potential 
sub-matrices available to the Uy for size m. This will give both unitarity as well as 
the visual that we have already predicted. 


(Caution. If we designate some function with 2” inputs and 2” outputs as frm, 
we are not saying that Uf, 0.414, = Ufnm QU gnm for two smaller oracles — it isn’t. That 
statement doesn’t even track, logically, since it would somehow imply we could gen- 
erate an arbitrary Uy, which can be exponentially complex, out of repeated products 
of something very simple. Rather, we merely noted the fact that the sub-matrices 
along the diagonal are, individually, tensor products of the next-smaller m’s possible 
diagonal sub-matrices. We still need the full details of f to compute all the smaller 
sub-matrices, which will be different (in general) from one another.| 


482 


17.6 The Complexity of a Quantum Algorithm Rel- 
ative to the Oracle 


In this course, we are keeping things as simple as possible while attempting to provide 
the key ideas in quantum computation. To that end, we’ll make only one key classi- 
fication of oracles used in algorithms. You will undoubtedly explore a more rigorous 
and theoretical classification in your advanced studies. 


Complexity of the Oracle 


The above constructions all demonstrated that we can take an arbitrary function, f, 
and, in theory, represent it as a reversible gate associated with a unitary matrix. If 
you look at the construction, though, you’ll see that any function which requires a full 
table of 2” values to represent it (if there is no clear analytical short-cut we can use 
to compute it) will likewise end up with a similarly complicated oracle. The oracle 
would need an exponentially large number of gates (relative to the number of binary 
inputs, n). An example will come to us in the form of Simon’s algorithm where we 
have a Zon-periodic function (notation to be defined) and seek to learn its period. 


However, there are functions which we know have polynomial complexity and can 
be realized with a correspondingly simple oracle. An example of this kind of oracle 
is that which appears in Shor’s factoring algorithm (not necessarily Shor’s period- 
finding algorithm). In a factoring algorithm, we know a lot about the function that 
we are trying to crack and can analyze its specific form, proving it to be O(n?). Its 
oracle will also be O(n*). 


Two Categories of a Quantum Algorithm’s Complexity 


Therefore, whenever presenting a quantum algorithm, one should be clear on which 
of the two kinds of oracles are available to the circuit. That distinction will be made 
using the following, somewhat informal (and unconventional) language. 


e Relativized Time Complexity - This is the time complexity of a {circuit + 
algorithm} without knowledge of the oracle’s design. If we were later given the 
complexity of the oracle, we would have to modify any prior analysis to account 
for it. Until that time we can only speak of the algorithm and circuit around 
the oracle. We can say that some algorithm is O(n?), e.g., but that doesn’t 
mean it will be when we wire up the oracle. It is O(n?) relative to the oracle. 


e Absolute Time Complexity - This is the time complexity of a {circuit + 
algorithm} with knowledge of the oracle’s design. We will know and incorporate 
the oracle’s complexity into the final {circuit + algorithm}’s complexity. 


483 


Chapter 18 


Simon’s Algorithm for 
Period-Finding 


18.1 The Importance of Simon’s Algorithm 


Simon’s algorithm represents a turning point in both the history of quantum computer 
science and in every student’s study of the field. It is the first algorithm that we study 
which represents a substantial advance in relativized time complexity vs. classical 
computing. The problem is exponential classically, even if we allow an approximate 
rather than a deterministic solution. In contrast, the quantum algorithm is very fast, 
O(log? N), where N is the size of the domain of f. In fact, you will see a specific 
algorithm, more detailed than those usually covered, which has complexity O(log? N). 
We'll prove all this after studying the quantum circuit. 


[Before going any further, make sure you didn’t overlook the word “relativized” 
in the first paragraph. As you may recall from a past lecture, it means that we 
don’t have knowledge, in general, about the performance of the circuit’s oracle Us. 
Simon’s quantum algorithm gives us a lower bound for the relative complexity, but 
that would have to be revised upward if we ended up with an oracle that has a larger 
“big-oh.”. This is not the fault of QC; the kinds of periodic functions covered in 
Simon’s treatment are arbitrarily complex. The function that we test may have a 
nice O(n) or O(n?) implementation, in which case we’ll be in great shape. If not, 
it will become our bottleneck. That said, even with a polynomial-fast oracle, there 
is no classical algorithm that can achieve polynomial time solution, so the quantum 
solution is still a significant result. ] 


While, admittedly a toy problem in the sense that it’s not particularly useful, 
it contains the key ingredients of the relevant algorithms that follow, most notably 
Shor’s quantum factoring and encryption-breaking. Even better, Simon’s algorithm 
is free of the substantial mathematics required by an algorithm like Shor’s and thus 
embodies the essence of quantum computing (in a noise-free environment) without 
distracting complexities. 


A484 


In the problem at hand, we are given a function and asked to find its period. How- 
ever, the function is not a typical mapping of real or complex numbers, and the period 
is not the thing that you studied in your calculus or trigonometry classes. Therefore, 
a short review of periodicity, and its different meanings in distinct environments, is 
in order. 


18.2 Periodicity 


We'll first establish the meaning of periodicity in a typical mathematical context — 
the kind you may have seen in past calculus or trig course. Then we’ll define it in a 
more exotic mod-2 environment Zon © (Z2)”. 


18.2.1 Ordinary Periodicity 


A function defined on the usual sets, e.g., 


R 
rae C — S$ (S could be R, Zs, etc.), 
Z 


is called periodic if there is a unique smallest a > 0 (called the period), 
with 


fle+ta) = fle), 


for all x in the domain of f. 


(“Unique smallest” is redundant, but it forces us to remember the second of two 
separate requirements. First, there must exist some a > 0 with the property. If there 
does, we would call the smallest such a its period.) For example, 


sin(v +27) = sin(x) for all z, 
and 27 is the smallest positive number with this property, so a = 27 is its period. 47 
and 127 satisfy the equality, but they’re not as small as 27, so they’re not periods. 
18.2.2 (Z»)" Periodicity 


Let’s change things up a bit. We define a different sort of periodicity which respects 
not ordinary addition, but mod-2 addition. 


A function defined on (Zz)", 


f: (Zo)" — S  (Sistypically Z or (Zy)",m>n-—1), 


485 


is called (Z2)" periodic if there exists ana € (Z2)", a#0, such that, 


for allx #y in (Zy)", we have 
fx) = fy) = y=x@a. 


a is called the period of f, with (Z2)”" periodicity understood by context. 


(Notice that in our definition, we rule out period a = 0. We could have allowed it, 
in which case one-to-one functions — see definition below — are periodic with period 
0, but this adds a trivial extra case to our discussion of Simon’s algorithm which has 
no pedagogical or historical value.) 


There is a subtle but crucial message hidden in the definition. The zf-and-only-if 
(<) aspect implies that if we find even one pair of elements, x’ 4 y’ with f(x’) = 
f(y’), then it must be true that y’ = x’ @a. Once we know that, we have the 
pleasant consequence that this one pair can be combined to recover the period using 
a =x’ @y’. (Confirm this for yourself.) This is very different from the more common 
periodicity of, say, sinz, wherein sin0 = sinz, but 7 is not its period since for many 
other pairs (like .2 and .2+ 7) sing ” sin(x+7). It turns out that we won’t make 
use of this property in the quantum algorithm, but it will help when analyzing the 2 
time complexity of the classical algorithm. 


Due to the isomorphism between (Z2)” and (Zan, ®), we can use the simple integer 
notation along with the @ operator to effect the same definition, as we demonstrate 
next. 

18.2.3 Zor Periodicity 
f is Zon periodic if there exists an a € Zor, a #0, such that, 
for all x Ay in Zon, we have 


f@)=fy) @ y= «Oa. 


a is called the period of f, with Zon periodicity understood by context. 


As I mentioned, this is the same periodicity defined above and has the same im- 
plications. We will use the two periodicity definitions interchangeably going forward. 
Examples of (Z2)” Periodicity 


We implied that the range of f could be practically any set, S, but for the next few 
examples, let’s consider functions that map Zon into itself, 


i : Zign —_ Zign 


486 


Collapsing One Bit — Periodic 


A simple (Z2)” periodic function is one that preserves all the bits of x except for one, 
say the kth, and turns that one bit into a constant (either 0 or 1). 


Let n = 5, k = 2, and the constant, 1. This is a “2nd bit collapse-to-1,” alge- 
braically 


II 
—_ 


f(x) = 4£4%315%1%0 


Ifn = 5, k = 4, and the constant was 0, we’d have a “4th bit collapse-to-0,” 


= 0 13212 12Xo. 


II 
8 
two 


g(x) 


Let’s show why the first is (Z2)’ periodic with period 4, and this will tell us why all 
the others of its ilk are periodic (with possibly a different period), as well. 


Notation - Denote the bit-flip operation on the kth bit, 7, to mean 
es 0, if Lk = 1, 
Xk = 
[Exercise. Demonstrate that you can effect a bit-flip on the kth bit of x using 7@2*.] 
I claim that the “2nd-bit collapse-to-1” function, f, is (Z2)” periodic with period 
a = 4. We must show 
fz) =fy @e yore 


e <=: Assume y = «4. That means (in vector notation) 
v4 Ya v4 
v3 ¥3 v3 
XS" || ae and yY = | % = a) 
ry Y1 v1 
ro Yo ro 
Then, 
LA LA 
v3 v3 
fix): = 1 |, but,also f(y) = ele 
XY Ly 
Xo Xo 


proving f(x) = f(y). 


487 


e =>: Assume f(x) = f(y) forx ¢ y. Since f does not modify any bits other 
than bit 2, we conclude y, = xz, except for (possibly) bit & = 2. But since 
x #y, some bit must be different between them, so it has to be bit-2. That is, 


Y4 v4 
¥3 v3 
ye Ieee = Ze = x4, 
Y1 vy 
Yo TO 


showing that y = x @4. QED 


Of course, we could have collapsed the 2nd bit to 0, or used any other bit, and gotten 
the same result. A single bit collapse is (Z2)” periodic with period 2°, where k is the 
bit being collapsed. 


Note - With single bit collapses, there are always exactly two numbers from the 
domain Zon that map into the same range number in Zan. With the example, f, just 


discussed, 
00100 
+— 00100, 
00000 
10101 
> 10101, 
a 
11111 
es Bt. 
aa 
and so on. 


Collapsing More than One Bit — Not Periodic 


A collapse of two or more bits can never be periodic, as we now show. For illustration, 
let’s stick with n = 5, but consider a simultaneous collapse-to-1 of both the 2nd and 
Oth bit, 


v4 
v3 
f(z) = 1 = fas deel 
Bal 
yt 


In this situation, for any x in the domain of f, you can find three others that map to 
the same f(z). For example, 


00000 = 0 
00001 = 1 
t+ 00101=5. 
00100 = 4 
00101 =5 


488 


If there were some period, a, for this function, it would have to work for the first two 
listed above, meaning, f(0) = f(1). For that to be true, we’d need 1 = 0 @a, which 
forces a = 1. But a = 1 won’t work when you consider the first and third x, above: 


f(0) = f(4), yet 4 AO@ 1. 
As you can see, this comes about because there are too many zs that get mapped 
to the same f(z). 


Let’s summarize: bit collapsing gives us a periodic function only if we collapse 
exactly one bit (any bit) to a constant (either 0 or 1). However, a function which is 
a “multi-bit-collapser” (preserving the rest) can never be periodic. 


So, what are the other periodic functions in the Zon milieu? 


18.2.4 1-to-1, 2-to-1 and n-to-1 


The answer requires a simple definition. First some more terminology. 


Domain and Range 


If f is a function, we'll use “dom(f)” to designate the domain of f, and “ran(f)” to 
designate the range of f. Ifw € ran(f), the “pre-image” of w is the set {x | f(a) = w}. 


n-To-One 


One-To-One: We say “f is 1-to-1” whenzx4#y => f(x) 4 f(y), for 
all x,y in dom(f). 


If S C dom/(f) then, even if f is not 1-to-1, overall, it can still be 1-to-1 on S. 
(Student to fill in the adjustments to the definition). 


Two-To-One: We say “f is 2-to-1” when every w in ran(f) has exactly 
two pre-images in dom(f). 


n-To-One: We say “f is n-to-1” when every w in ran(f) has exactly n 
pre-images in dom(f). 


Lots of Periodic Functions 


(In this small section we will set up a concept that you'll find useful in many future 
quantum algorithms: the partitioning of dom(f) into subsets.) 


We now characterize all Zyn periodic functions. 


Let’s say you have one in your hands, call it f, and you even know its period, a. 
Buy three large trash bins at your local hardware store. Label one R, one Q and the 
third, S (for Source pool). Dump all « € dom(f) (which is Zyn) into the source pool, 


489 


S. We’re going to be moving numbers from the source, S, into R and Q according to 
this plan: 


ile 


2 


Pick any x € S = Zon. Call it ro (0, because it’s our first pick). 


Generate ro’s partner go = To a (partner, because periodicity guarantees that 
f maps both ro and qo to the same image value and also that there is no other 
x € Zon which maps to that value.) 


. Toss ro into bin R and its partner, go, (which you may have to dig around in S 


to find) into bin Q. 


. We have reduced the population of the source bin, 5, by two: S = S—{ro, qo}. 


Keep going... . 


. Pick a new x from what’s left of S. Call it r; (1, because it’s our second pick). 


. Generate r;’s partner q; = Tr; @ a. (Again, we know that f maps both r; and 


q, to the same image value, and there is no other x € Zon which maps to that 
value. ) 


. Toss r; into bin R and its partner, q,, into bin Q. S is further reduced by two 


and is now S = S — {ro, qo, M1, q}- 


. Repeat this activity, each pass moving one value from bin S into bin R and its 


partner into bin Q until we have none of the original domain numbers left in S. 


. When we’re done, half of the xs from dom(f) will have ended up in R and the 


other half in Q. (However, since we chose the first of each pair at random, this 
was not a unique division of dom(f), but that’s not important.) 


Here is the picture, when we’re done with the above process: 


Zign — R U Q 
— eee Tee ee U 1 Say Gicsdet 


where R and Q are of equal size, and 


Fre) = F(de) 


but f is otherwise one-to-one on R and Q, individually. 


From this construction, we can see a couple things: 


I 


Every periodic function in the Zgn sense is 2-to-1. If a function is not 2-to-1, it 
doesn’t have a prayer of being periodic. 


490 


2. We have a way to produce arbitrary Zon periodic functions. Pick a number 
you want to be the period and call it a. Start picking r, numbers from Zon 
at random, each pick followed immediately by the computation of its partner, 
dk = Tr @a. Assign any image value you like to these two numbers. (The 
assigned image values don’t even have to come from Zn — they can be the 
chickens in your yard or the contacts in your phone list. All that matters is 
that once you use one image value for an r;, — gq, pair, you don’t assign it to any 
future pair.) This will define a Zn periodic function with period a. 


[Exercise. Generate two different Zgn periodic functions. | 


18.3 Simon’s Problem 


We’ve developed all the vocabulary and intuition necessary to understand the state- 
ment of Simon’s problem, and the quantum fun can now begin. 


Statement of Simon’s Problem 


Let f : (Z2)" — (Zp)" be (Z2)” periodic. 

Find a. 
(I made a bold because I phrased the problem in terms of the vector space (Z2)"; 
had I used the integer notation of Zyn, I would have said “find a”, without the bold.) 


It’s not really necessary that ran(f) be the same (Z2)” as that of its domain. In 
fact, the range is quite irrelevant. However, the assumption facilitates the learning 
process, and once we have our algorithm we can lose it. 


Let’s summarize some of the consequences of (Z2)"periodicity. 


1. fly) = fx) @ y=x@a 


2. f(y) = f(x) > a=x@y 


3. f is two-to-one on (Zz)" 
We can also state this in the equivalent Zon language. 
1. fy) = fl) @ y= 20a 


2. fy) = fl) > a= 20y 


3. f is two-to-one on Zon 


Pay particular attention to bullet 2. If we get a single pair that map to the same 
f(x) = f(y), we will have our a. This will be used in the classical (although not 
quantum) analysis. 


491 


18.4 Simon’s Quantum Circuit Overview and the 
Master Plan 


18.4.1 The Circuit 


A bird’s eye view of the total circuit will give us an idea of what’s ahead. 


Jo)" a ae A 
Us 

jo)" A 
You see a familiar pattern. There are two multi-dimensional registers, the upper 
(which I will call the A register, the data register or even the top line, at my whim), 


and the lower (which I will call the B register, target register or bottom line, corre- 
spondingly. ) 


This is almost identical to the circuits of our recent algorithms, with the following 
changes: 


e The target channel is “hatched,” reflecting that it has n component lines rather 
than one. 


e We are sending a |0)” into the bottom instead of a |1) (or even |1)"). 


e We seem to be doing a measurement of both output registers rather than ig- 
noring the target. 


In fact, that third bullet concerning measuring the bottom register will turn out to be 
conceptual rather than actual. We could measure it, and it would cause the desired 
collapse of the upper register, however our analysis will reveal that we really don’t 
have to. Nevertheless, we will keep it in the circuit to facilitate our understanding, 
label it as “conceptual,” and then abandon the measurement in the end when we are 
certain it has no practical value. 


Here is the picture I'll be using for the remainder of the lesson. 


Jo)" wen Bee A 
U . 
s (actual) 
|0)" 
—$_S =’ 
(conceptual) 


[Note: I am suppressing the hatched quantum wires to produce a cleaner circuit. 
Since every channel is has n lines built-into it and we clearly see the kets and operators 
labeled with the “exponent” n, the hatched wires no longer serve a purpose. 


A492 


18.4.2 The Plan 

We will prepare a couple CBS kets for input to our circuit, this time both will be |0)”. 
The data channel (top) will first encounter a multi-dimensional Hadamard gate to 
create a familiar superposition at the top. This sets up quantum parallelism which we 
found to be pivotal in past algorithms. The target channel’s |0)” will be sent directly 
into the oracle without pre-processing. This is the first time we will have started with 
a |0) rather than a |1) in this channel, a hint that we’re not going to get a phase 
kick-back today. Instead, the generalized Born rule, (QM Trait #15”) will turn out 
to be our best friend. 


[Preview: When we expect to achieve our goals by applying the Born rule to a 
superposition, the oracle’s target register should normally be fed a |0)” rather than 
a |1)",] 

After the oracle, both registers will become entangled. 


At that point, we conceptually test the B register output. This causes a collapse 
of both the top and bottom lines’ states (from the Born rule), enabling us to know 
something about the A register. We'll analyze the A register’s output — which re- 
sulted from this conceptual B register measurement — and discover that it has very 
special properties. Post processing the A register output by a second “re-organizing” 
Hadamard gate will seal the deal. 


In the end, we may as well have measured the A register to begin with, since 
quantum entanglement authorizes the collapse using either line, and the A register is 
what we really care about. 


Strategy 


Our strategy will be to “load the dice” by creating a quantum circuit that spits out 
measured states which are “orthogonal” to the period a, i.e., z-a = 0 mod-2. (This is 
not a true orthogonality, as we’ll see, but everyone uses the term and so shall we. We'll 
discover that states which are orthogonal to a can often include the “vector” a, itself, 
another reason for the extra care with which we analyze our resulting “orthogonal” 
states. ) 


That sounds paradoxical; after all, we are looking for a, so why search for states 
orthogonal to it? Sometimes in quantum computing, it’s easier to back-into the 
desired solution by sneaking up on it indirectly, and this turns out to be the case in 
Simon’s problem. You can try to think of ways to get a more directly, and if you 
find an approach that works with better computational complexity, you may have 
discovered a new quantum algorithm. Let us know. 


Because the states orthogonal to a are so much more likely than those that are not, 
we will quickly get a linearly independent set of n—1 equations with n — 1 unknowns, 
namely, a-w;, = 0, fork = 0,...,n—2. We then augment this system instantly (using 
a direct classical technique) with an nth linearly independent equation, at which point 
we can solve the full, non-degenerate n x n system for a using fast and well-known 


493 


techniques. 


18.5 The Circuit Breakdown 


We need to break the circuit into sections to analyze it. Here is the segmentation: 


|0)" —H®" ies 
Us 
|0)" A 
— 
(conceptual) 
v v 
A B 


18.6 Circuit Analysis Prior to Conceptual Mea- 


surement: Point B 


We now travel carefully though the circuit and analyze each component and its con- 


sequence. 


18.6.1 Hadamard Preparation of the A Register 


This stage of the circuit is identical in both logic and intent with the Deutsch-Jozsa 
and Bernstein-Vazirani circuits; it sets up quantum parallelism by producing a per- 
fectly mixed entangled state, enabling the oracle to act on f(x) for all possible z, 


simultaneously. 
\0)" Her Hee 
Us 
ra 


lo)” ! 


Hadamard, H®", in Hyn) 


494 


It never hurts to review the general definition of a gate like H®". For any CBS |)”, 
the 2”-dimensional Hadamard gate is expressed in encoded form using the formula, 


y=0 


where © is the mod-2 dot product. Today, I’ll be using the alternate vector notation, 


went = (4) Sy wr, 


y=0 


where the dot product between vector x and vector y is also assumed to be the mod-2 


dot product. In the circuit, we have 
Qn] 


jx)” He (5) So D*” Wy”, 
y=0 
which, when applied to |0)”, reduces to 
gn—1 
0)" He (5) Sy". 
y=0 


or, returning to the usual computational basis notation, |x)", for the summation index 
is 
ori 


ee (Ja) bey”. 


You'll recognize the output state of this Hadamard operator as the nth order x-basis 
CBS ket, |0)!'. It reminds us that not only do Hadamard gates provide quantum 
parallelism but double as a z © x basis conversion operator. 


18.6.2 The Quantum Oracle on CBS Inputs 


Next, we look at this part of the circuit: 


jo)" He Hoe A 
Us 
jo)" A 


e—— SS 
Quantum Oracle 


495 


Due to the increased B channel width, we had better review the precise definition of 
the higher dimensional oracle. It’s based on CBS kets going in, 


|)” |x)" 
Us 
ly)” ly ® f(x))” 


n n Uy n n 
Iz)" |y) > |x)" ly B f(x))" 
and from there we extend to general input states, linearly. We actually constructed 
the matrix of this oracle and proved it to be unitary in our lesson on quantum oracles. 
Today, we need only consider the case of y = 0: 


le)” Ie)” 
Us 
lo)” Lf(@))” 
Iz)" [0)" |x)" | F(a)” 
In Words 
We are 


1. taking the B register CBS input |y)”, which is |0)” in this case, and extracting 
the integer representation of y, namely 0, 


2. applying f to the integer x (of the A register CBS |x)") to form f(z), 
3. noting that both 0 and f(x) are € Zan, 


4. forming the mod-2 sum of these two integers, 0@ f(x), which, of course is f(), 
and 


n 


5. using the result to define the output of the oracle’s B register, | f(x)) 


6. Finally, we recognize the output to be a separable state of the two output 
registers, |x)" |f(a))”. 


Just to remove any lingering doubts, assume n = 5, x = 18, and f(18) = 7. Then 
the above process yields 


1. |0)? — 0 


2 f(18)=7 
yeaa) alee kr ence 


496 


3.0 67 = 00000 @ 00111 = 00111 = 7, 
a on 0 


5 5 at 5 5 
5. |18)° @ |0)°? ————> |18)° @]|7) 


18.6.3. The Quantum Oracle on Hadamard Superposition In- 
puts 
‘, going into 


Next, we invoke linearity to the maximally mixed superposition state |0) 
the oracle’s top register. 


Reminder. I’ve said it before, but it’s so important in a first course such as 
this that I'll repeat myself. The bottom register’s output is not f applied to the 
superposition state. f only has meaning over its domain Z gn, which corresponds to 
the finite set of z-basis CBS kets { |xz)}. It has no meaning when applied to sums, 
especially weighted sums (by real or complex amplitudes) of these preferred CBSs. 


By linearity, Uy distributes over all the terms in the maximally mixed input |0)",, 
at which point we apply the result of the last subsection, namely 


Us( |)" 10)" ) = |x)" |f(@))", 


to the individual summands to find that 
1\n Qn_1 iAP Qn_4 
n n Ur n n 
(=) So re yn (=) Sr een" 


This is a weighted sum of separable products. (The weights are the same for each 
separable product in the sum: (1/V/2)".) That sum is not, as a whole, separable, 
which makes it impossible to visualize directly on the circuit diagram unless we com- 
bine the two outputs into a single, entangled, output register. However we do have 
an interpretation that relates to the original circuit. 


e The output state is a superposition of separable terms |)” ne But this is 


exactly the kind of sum the generalized Born rule needs, 


ley = JOYE Io + LY Wa) + oo + 1) toma) 


so an A-measurement of “xz” would imply a B-state collapse to its (normalized) 
partner, |f(x))”. 


e A similar conclusion would be drawn if we chose to measure the B register first, 
but we'll get to that slightly more complicated alternative in a moment. (Sneak 
preview: there are two — not one — pre-images x for every f(x) value.) 


497 


e Each of the 2” orthogonal terms in the superposition has amplitude (1 / V2)", 
so the probability that a measurement by A will collapse the superposition to 


any one of them is ( (/v2)" i = 12t. 


[Exercise. Why are the terms orthogonal? Hint: inner product of tensors. | 


n 


[Exercise. Look at the sum in our specific situation: > |x)" |f(a))". QM 
«z=0 

Trait #6 (Probability of Outcomes) assumes we start with an expansion along some 
computational basis, but this expansion only has 2” terms, so can’t be a basis for 
A ® B which has 2” - 2” = 2?" basis vectors. Why can we claim that the scalars 
(1/ V2)" are amplitudes of collapse to one of these states? Hint: While the kets 
in the sum do not comprise a full basis, they are all distinct CBS kets. (Are they 
distinct? each f(x) appears twice in the sum. But that doesn’t destroy the linear 
independence of the tensor CBS kets in that sum, because ... .) Therefore, we can 
add the missing CBS kets into the sum as long as we accompany them by 0 scalar 
weights. Now we can apply Trait #6.| 


nm 


[Exercise. Look at the the more general sum in the Born rule: S* |x)"\ |Wx)p - 
z=0 

QM Trait #6 (Probability of Outcomes) assumes we start with an expansion along 
some computational basis, but this expansion only has 2” terms, so can’t be a basis 
for A® B which has 2” - 2” = 2”*™ basis vectors. Why can we claim that the scalars 
(1/ V2)" are amplitudes of collapse to one of these states? Hint: Force this to be 
an expansion along the tensor CBS by expanding each |q,,)3 along the B-basis then 
distributing the |x)",s. Now we meet the criterion of Trait #6.] 


18.6.4 Partitioning the Domain into Cosets 


It’s time to use the fact that f is Zan periodic with (unknown) period a to help us 
rewrite the output of the Oracle’s B register prior to the conceptual measurement. 
Zon periodicity tells us that the domain can be partitioned (in more than one way) 
into two disjoint sets, R and Q, 


Zign = vas U Q 
ete ees ame em se era 


Cosets 


Q can be written as R @a, that is, we “translate” the entire subset R by adding a 
number to every one of its members resulting in a new subset, Q. Mathematicians 
say that Q = R@aisa coset of the set R. Notice that R is a coset of itself since 
R=R@®0. In this case, the two cosets, R and Q = R @a partition the domain of f 
into two equal and distinct sets. 


498 


18.6.5 Rewriting the Output of the Oracle’s B Register 


Our original expression for the oracle’s complete entangled output was 


(=) os 2)” Lel@))", 


but our new partition of the domain will give us a propitious way to rewrite this. 
Each element x € R has a unique partner in @ satisfying 


2 f 
roa Beh 


Using this fact, we only need to sum the B register output over R (half as big as Zn) 
and include both x and x @a in each term, 


ae 3 ey" ey” = (2) So (ty + ea") [fte)” 
va) Vel Se 
7 en py (em + pea f(a)” 


I moved one of the factors of 1/ ,/2 into the sum so we could see 


1. how the new sum consists of normalized states (length 1), and 


2. the common amplitude remaining on the outside, nicely produces a normalized 
state overall, since now there are half as many states in the sum, but each state 
has twice the probability as before. 


Swapping the roles of channels A and B for the Born Rule 


The last rearrangement of the sum had a fascinating consequence. While the terms 
still consist of separable products from the A and B channels, now it is the B channel 
that has basis CBS kets, and the A channel that does not. 


How can we see this? Each |f(x))" was always a CBS state — f(x) is an integer 
from 0 to 2” —1 and so corresponds to a CBS ket — but the original sum was plagued 
by its appearing twice (destroying potential linear independence of the terms), so we 
couldn’t view the sum as an expansion along the B-basis. After consolidating all 
the pre-image pairs, we now have only one |f(x))” term for each x in the sum: we 
have factored the expansion along the B-basis. In the process, the A-factors became 
superpositions of A CBS kets, now mere |w,)’,s in the general population. This 
reverses the roles of A and B and allows us to talk about measuring B along the 
CBS, with that measurement selecting one of the non-CBS A factors. 


499 


The upshot is that we can apply the Born rule in reverse; we’ll be measuring the B 
register and forcing the A register to collapse into one of its “binomial” superpositions. 
Let’s do it. But first, we should give recognition to a reusable design policy. 


The Lesson: |0)" into the Oracle’s B Register 


This all worked because we chose to send the CBS |0)” into the oracle’s B register. 
Any other CBS into that channel would not have created the nice terms |)” | f(a2))” 
of the oracle’s entangled output. After factoring out terms that had common |f(a))” 
components in the B register, we were in a position to collapse along the B-basis and 
pick out the attached sum in the A register. 


Remember this. It’s a classic trick that can be tried when we want to select a 
small subset of A register terms from the large, perfectly mixed superposition in that 
register. It will typically lead to a probabilistic outcome that won’t necessarily settle 
the algorithm in a single evaluation of the oracle, but we expect it to give a valuable 
result that can be combined with a few more evaluations (loop passes) of the oracle. 
This is the lesson we learn today that will apply next time when we study Shor’s 
period-finding algorithm. 

In contrast, when we were looking for a deterministic solution in algorithms like 
Deutsch-Jozsa and Bernstein- Vazirani, we fed a |1) into the B register and used the 
phase kick-back to give us an answer in a single evaluation. 


|1) into oracle’s B register —>+ phase kick-back , 


|0)” into oracle’s B register —> Bornrule. 


18.7 Analysis of the Remainder of the Circuit: Mea- 
surements 


18.7.1 Hypothetical Measurement of the B Register 


Although we won’t really need to do so, let’s imagine what happens if we were to 
apply the generalized Born rule now using the rearranged sum (that turned the B 
channel into the CBS channel). 


|0)” H®" H®” A 
Us 
|o)" A 
Rei pees 
Conceptual 


Each B register measurement of “f(x)” will be attached to not one, but two, input A 
register states. Thus, measuring B first, while collapsing A, actually produces merely 


500 


a superposition in that register, not a single, unique x from the domain. It narrows 
things down considerably, but not completely, 


ar 3 (= heer) t(a))” EZ 


\ (Re) beta)” 


(Here, \\Y means collapses to.) 


Well that’s good, great and wonderful, but if after measuring the post-oracle B regis- 
ter, we were to measure line A, it would collapse to one of two states, |x ) or |v + a), 
but we wouldn’t know which nor would we know its unsuccessful companion (the one 
to which the state didn’t collapse). There seems to be no usable information here. 
As a result we don’t measure A ... yet. 


Let’s name the collapsed — but unmeasured — superposition state in the A register 
\wxo), Since it is determined by the measurement “f(xo)” of the collapsed B register, 


ia)” = (Lt mea) 


Guiding Principle: Narrow the Field. We stand back and remember this 
stage of the analysis for future use. Although a conceptual measurement of B does not 
produce an individual CBS ket in register A, it does result in a significant narrowing 
of the field. This is how the big remaining quantum algorithms in this course will 
work. 


18.7.2 Effect of a Final Hadamard on A Register 


In an attempt to coax the superposition ket |y,,)" € Hin) to cough up useful infor- 
mation, we take H®" |w,,)". This requires that we place an H®” gate at the output 
of the oracle’s A register: 


}0)” Hen He" A 
Uy 
Jo)" A 


[Apology. I can’t offer a simple reason why anyone should be able to “intuit” a 
Hadamard as the post-oracle operator we need. Unlike Deutsch-Jozsa, today we 
are not measuring along the z-basis, our motivation for the final H®" back then. 
However, there 7s a small technical theorem about a Hadamard applied to a binomial 
superposition of CBSs of the form (|x)” + |y)") //2 which is relevant, and perhaps 
this inspired Simon and his compadres.| 


501 


Continuing under the assumption that we measure an f(x) at the B register out- 
put, thus collapsing both registers, we go on to work with the resulting superposition 
|W2)) in the A register. Let’s track its progress as we apply the Hadamard gate to it. 
As with all quantum gates, H®” is linear so moves past the sum, and we get 


|zo)” + |vo @ a)” ye H®" |xzo)" + H®" |xo Ga)” 


v2 v2 


The Hadamard of the individual terms is 


1 n 
ee (sa) >on I)” 
V2 = 
and 
b\n gn—1 
y=0 
Focus on the integer coefficient 
(—1)¥" 0 ® a), 


This can be simplified by observing two facts 


1. The mod-2 dot product distributes over the mod-2 sum @, and 


2. (-1)?°4 = (-—1)?(—1)!, for p and qg in Zan. (Danger: while it’s true for a 
base of -1, it is not for a general complex base c). 


[Exercise. Prove the second identity. One idea: Break it into two cases, p © q 
even and p @ q odd, then consider each one separately. 


[Exercise. Find a complex c, for which (c)?®4 #  (c)?(c)4. Hint: Use the 
simplest case, n = 1.| 


Combining both facts, we get 


(-pr oes = (yy, 


502 


se) 
H®” |x)" + H®" |zo @ a)" 


V2 


| 
, 
KH 
Se” 
3 
+ 
Be 
i) 
3 
| 
“—— 
| 
—_ 
eee 
t 
* 
= 
TN 
—_ 
+ 
“—— 
| 
a 
nS" 
te 
» 
eee: 
eS 
ia ara 
3 


18.7.3 The Orthogonality of A Register Output Relative to 
the Unknown Period a 


Here’s where we are: 
Hen [eo + |to ® a 
a0 
gn] 


- (By Bears (scary wr 


The integer expression in the parentheses is seen to be 
0, ify-a=1 (mod 2) 
1 + (—1)¥"? — ’ 
2, ify-a=0 (mod 2) 


so we can omit all those 0 terms which correspond to y-a= 1 (mod 2), leaving 


Hon [lo + oa") = en a (I yy”. 


(mod 2) 


Note that the sum is now over only 2”~!, or exactly half, of the original 2” CBSs. 


(Exercise. Show that, for any fixed number a € Zgn (or its equivalent vector 
a € (Z2)") the set of all x with x © a = 0 (or x with x- a = 0) is exactly half the 
numbers (vectors) in the set.| 


Mod-2 Orthogonality vs. Hilbert Space Orthogonality 


Avoid confusion. Don’t forget that these dot products (like y-a) are mod-2 dot 
products of vectors in (Z2)". This has nothing to do with the Hilbert space inner 
product "(y|a)", an operation on quantum states. 


503 


When we talk about orthogonality of vectors or numbers at this stage of the 
analysis, we mean the mod-2 dot product, not the state space inner product. Indeed, 
there is nothing new or interesting about Hilbert space orthogonality at this juncture: 
CBS kets always form an orthonormal set, so each one has inner product = 1 with 
itself and inner product = 0 with all the rest. However, that fundamental fact doesn’t 
help us here. The mod-2 dot product does. 


Pause for an Example 


If n = 4, and 
0 a3 
= _ 1 = ag 
a= i) = 0 — ay ’ 
1 ao 
then 
{y iy a =O (mod 2) } 
a3 a3 
i ay ay 
0 ay,a3 € {0,1} 1 ay1,a3 € {0,1} 


which consists of eight of the original 24 = sicteen (Z2)* vectors associated with the 
CBSs. Therefore, there are 2”~! = 2? = 8 terms in the sum, exactly normalized by 
the (1//2)""! = (1/2)? = (1/78) out front. 

Note: In this case, a is in the set of mod-2 vectors which are orthogonal to it. 


[Exercise. Characterize all a which are orthogonal to themselves, and explain 
why this includes half of all states in our total set.| 


Returning to our derivation, we know that once we measure the B register and 
collapse the states into those associated with a specific | f(ao)), the A register can be 
post-processed with a Hadamard get to give 


|a9)” + xo SP) a)" 


a ae (4) So nr fy)". 


All of the vectors in the final A register superposition are orthogonal to a, so we can 
now safely measure that mixed state and get some great information: 


0)” He” Hen A N. |¥o); Yo-La 
Us 


jo)" 


504 


We don’t know which yo among the 2”~! ys we will measure — that depends on the 
whimsy of the collapse, and they’re all equally likely. However, we just showed that 
they’re all orthogonal to a, including yo. 


Warning(s): y = 0 Possible 


This is one possible snag. We might get a 0 when we test the A register. It is a possible 
outcome, since 0 = O is mod-2 orthogonal to everything. The probabilities are low — 
1/2” — and you can test for it and throw it back if that happens. We’ll account for 
this in our probabilistic analysis, further down. While we’re at it, remember that a, 
itself, might get measured, but that’s okay. We won’t know it’s a, and the fact 
that it might be won’t change a thing that follows. 


18.7.4 Foregoing the Conceptual Measurement 


We haven’t yet spoken of the actual A register measurement without first measur- 
ing the B register. Now that you’ve seen the case with the simplifying B register 
measurement, you should be able to follow this full development that omits that 
step. 


If we don’t measure B first, then we can’t say we have collapsed into any par- 
ticular f(x9) state. So the oracle’s output must continue to carry the full entangled 


summation 
(YE (EH) 


zER 


through the final [H®” @ 1°"). This would add an extra sum )> to all of our 


[ne @ 0%") (=) XS ("+ ea") Iron" 
~ (3) re [CS g) wor 


xzER 


(4) 1 ee oe 
= 


We can now cite the — still valid — result that led to expressing the Hadamard fraction 


505 


as a sum of CBS kets satisfying y - a = 0 (mod 2), so the last expression is 


(Sy) SIGY XS cre "| ver 


xrER y:-a=0 
(mod 2) 
1 2n—2 
-(5) XY ow (Seo ver 
y-a=0 xeR 
(mod 2) 


While our double sum has more overall terms than before, they are all confined to 
those y which are (mod-2) orthogonal to a. In fact, we don’t have to apply the Born 
rule this time, because all that we’re claiming is an A register collapse to one of the 
2”! CBS kets |y)” which we get compliments of quantum mechanics: third postulate 
+ post measurement collapse. 


Therefore, when we measure only the A register of this larger superposition, the 
collapsed state is still some y orthogonal to a. 


Jo)" He" Hen A N ly), ya 


jo)" ) 


A Small Change in Notation 


Because I often use the variable y for general CBS states, |y), or summation variables, 


y>, I’m going to switch to the variable z for the measured orthogonal output state, 
y 
as in |z), zLa. We'll then have a mental cue for the rest of the lecture, where z 


will always be a mod-2 vector orthogonal to a. With this last tweak, our final circuit 
result is 


Jo)" Hoe Hen A N. |g), za 
Us 
Jo)” © 


18.8 Circuit Analysis Conclusion 


In a single application of the circuit, we have found our first z, with za. We would 
like to find n — 1 such vectors, all linearly independent as a set, so we anticipate 


506 


sampling the circuit several more times. How long will it take us to be relatively 
certain we have n — 1 independent vectors? We explore this question next. 


I will call the set of all vectors orthogonal to a either a, (if using vector notation) 
or a, (if using Zgn notation). It is pronounced “a-perp.” 


[Exercise. Working with the vector space (Z2)", show that a, is a vector sub- 
space. (For our purposes, it’s enough to show that it is closed under the © operation).| 


(Exercise. Show that those vectors which are not orthogonal to a do not form a 
subspace. | 


18.9 Simon’s Algorithm 


18.9.1 Producing n — 1 Linearly Independent Vectors 


[Notational Alert. By now, we’re all fluent in translating from the vectors (a, wx) 
of Zgn to the encoded decimals (a, wz) of (Z2)" notation, so be ready to see me switch 
between the two depending on the one I think will facilitate your comprehension. I’ll 
usually use encoded decimals and give you notational alerts when using vectors.| 


What we showed is that we can find a vector orthogonal to a in a single application 
of our circuit. We need, however, to find not one, but n — 1, linearly-independent 
vectors, z, that are orthogonal to a. Does repeating the process n — 1 times do the 
trick? Doing so would certainly manufacture 

{Z0, 21, 22 «08, Bot 
with 
Ze fork] Onn 2: 
however, that’s not quite good enough. Some of the z, might be a linear combination 


of the others or even be repeats. For this to work, we’d need each one to not only be 
orthogonal to a, but linearly independent of all the others as well. 


Pause for an Example 


if w=4).and 
0 a3 
= a 1 2 ag 
Ca 0 > a, |’ 
1 ao 
suppose that the circuit produced the three vectors 
0 0 0 
1 0 1 
Be ee at eee a 
1 0 1 


507 


after three circuit invocations. While all three are orthogonal to a, they do not form 
a linearly-independent set. In this case, 3 = n—1 was not adequate. Furthermore, a 
fourth or fifth might not even be enough (if, say, we got some repeats of these three). 

Therefore, we must perform this process m times, m > n—1, until we have n—1 
linearly-independent vectors. (We don’t have to say that they must be orthogonal to 
a, since the circuit construction already guarantees this.) How large must m be, and 
can we ever be sure we will succeed? 


18.9.2 The Algorithm 


We provide a general algorithm in this section. In the sections that follow, we’ll 
see that we are guaranteed to succeed in polynomial time O(n*) with probability 
arbitrarily close to 1. Finally, Pll tweak the algorithm just a bit and arrive at an 
implementation that achieves O(n?) performance. 

Whether it’s O(n*), O(n?) or any similar big-O complexity, it will be a significant 
relative speed-up over classical computing which has exponential growth, even if we 
accept a non-deterministic classical solution. (We’ll prove that in the final section of 
this lesson.) The problem is hard, classically, but easy quantum mechanically. 

Here is Simon’s algorithm for Zgn-period finding. To be sure that it gives a 
polynomial time solution, we must eventually verify that 


1. we only have to run the circuit a polynomial number of times (in n) to get 
n — 1 linearly independent zs which are orthogonal to a with arbitrarily good 
confidence, and 


2. the various classical tasks — like checking that a set of vectors is linearly inde- 
pendent and solving a series of n equations — are all of polynomial complexity 
in n, individually. 


We'll do all that in the following sections, but right now let’s see the algorithm: 


e Select an integer, 7, which reflects an acceptable failure probability of 1/27. 


e Initialize a set W to the empty set. W will eventually contain a growing number 
of (Z2)" vectors, 


e Repeat the following loop at most n+ T times. 


1. Apply Simon’s circuit. 
2. Measure the output of the final H®” to get z. 


3. Use a classical algorithm to determine whether or not z is linearly depen- 
dent on the vectors in W. 


— If it is independent, name it w;, where j is the number of elements 
already stored in W, and add w, to W. 


508 


* if 7 = n-— 2, we have n— 1 linearly-independent vectors in W 
and are done; break the loop. 


— If it is not independent (which includes special the case z = 0, even 
when W is still empty), then continue to the next pass of the loop. 


e If the above loop ended naturally (i.e., not from the break) after n+ T full 
passes, we failed. 


e Otherwise, we succeeded. Add an nth vector, w, 1, which is linearly indepen- 
dent to this set (and therefore not orthogonal to a, by a previous exercise), done 
easily using a simple classical observation, demonstrated below. This produces 
a system of n independent equations satisfying 


0, kK=0,...,n-2 
Wr-h = 
1, k=n-1 


which has a unique non-zero solution. 


e Use a classical algorithm to solve the systems of n equations for a. 


Note: By supplying the nth vector (a fast, easy addition of cost O(n)), we get a full 
system requiring no extra quantum samples and guaranteeing that our system yields 
a, unequivocally. 


Time and Space Complexity 


We will run our circuit n + T = O(n) times. There will be some classical algorithms 
that need to be tacked on, producing an overall growth rate of about O(n?) or O(n*), 
depending on the cleverness of the overall design. But let’s back-up a moment. 


I’ve only mentioned tzme complexity. What about circuit, a.k.a. spatial, complex- 
ity? 

There is a circuit that must be built, and we can tell by looking at the number 
of inputs and outputs to the circuit (2n each) that the spatial complexity is O(n). 
I mentioned during our oracle lesson that we can ignore the internal design of the 
oracle because it, like its associated f, may be arbitrarily complex; all we are doing in 
these algorithms is getting a relativized speed-up. But even setting aside the oracle’s 
internals, we can’t ignore the O(n) spatial input/output size leading to/from the 
oracle as well as the circuit as a whole. 


Therefore, if I operate the circuit in an algorithmic “while loop” of n+ T = O(n) 
passes, the time complexity of the quantum circuit (not counting the classical tools 
to be added) is O(n). Meanwhile, each pass of the circuit uses O(n) wires and gates, 
giving a more honest O(n?) growth to the { algorithm + circuit } representing (only) 
the quantum portion of the algorithm. 


So, why do I (and others) describe the complexity of the quantum portion of the 
algorithm to be O(n) and not O(n?)? 


509 


We actually touched on these reasons when describing why we tend to ignore 
hardware growth for these relatively simple circuits. I’ll reprise the two reasons given 
at that time. 


e Our interest is in relative speed-up of quantum over classical methods. Since 
we need O(n) inputs/outputs in both the classical and quantum circuit, it is 
unnecessary to carry the same cost of O(n) on the books — it cancels out when 
we compare the two regimes. 


e The quantum circuits grow slowly (linearly or logarithmically) compared to 
more expensive classical post processing whose time complexity is often at least 
quadratic and (this is important) in series with the quantum circuit. These 
attending sub-algorithms therefore render an O(n) or O(n?) quantum growth 
invisible. 


So, be aware of the true overall complexity, but understand that it’s usually un- 
necessary to multiply by a linear circuit complexity when doing computational ac- 
counting. 


18.9.3 Strange Behavior 


This section is not required but may add insight and concreteness to your under- 
standing of the concepts. 


[Notational Alert. I’ll use vector notation (a, w;) rather than encoded decimal 
(a, w,) to underscore the otherwise subtle concepts. | 


We know that (Z2)” is not your grandparents’ vector space; it has a dot-product 
that is not positive definite (e.g., a: a might = 0), which leads to screwy things. 
Nothing that we can’t handle, but we have to be careful. 


We'll be constructing a basis for (Z2)" in which the first n—1 vectors, Wo, ...,Wn—2 
are orthogonal to a, while the nth vector, w,_1, is not. In general, this basis will not 
be — and cannot be — an orthonormal basis. It is sometimes, but not usually. To see 
cases where it cannot be, consider a period a that is orthogonal to itself. In such a 
case, since a € a, we might have a = w,; for some k < n—1, say a= Wo. Therefore, 
we end up with 


Wo'Wo = a-a = O and 
Wo'Wn-1 = Aa'Wr-1l = 1, 
either condition contradicting orthonormality. 


In particular, we cannot use the dot-with-basis trick to compute expansion coeffi- 
cients, a trick that we saw required an orthonormal basis to work. So, in this regime, 
even when expanding a vector v along the algorithm’s resulting w-basis, 

n-1 
v= Ck Wk, 
0 


> 
ll 


we would (sadly) discover that, for many of the coefficients, cz, 
Ch Fo VOWE. 


This is not a big deal, since we will not need the trick, but it’s a good mental exercise 
to acknowledge naturally occurring vector spaces that give rise to non-positive defi- 
nite pairings and be careful not to use those pairing as if they possessed our usual 
properties. 


Example #1 


Take n = 4 and 


) 
| 
FOROS 


We may end up with a basis that uses the 3-dimensional subspace, a,, generated by 
the three vectors 


ooor 
ao) 
FOrHRO 
I 
) 


coor © 


These four vectors form a possible outcome of our algorithm when applied to (Zy)* 
with a period of a= (0,1, 0, 1 Nia and you can confirm the odd claims I made above. 


Example #2 


If you’d like more practice with this, try the self-orthogonal 


jab) 
II 
eee Ye 


511 


and the basis consisting of the three vectors orthogonal to a, 


eee 
I| 
jab) 


1 
0 

2 0 ? 
1 


oOo KF 


plus a fourth basis vector not orthogonal (to a) 


oo oF 


This isn’t an othonormal basis, nor will the dot-with-basis trick work. 


[Exercise. Create some of your own examples in which you do, and do not, get 
an orthonormal basis that results from the algorithm’s desired outcome of n — 1 basis 
vectors being orthogonal to a and the nth, not.| 


18.10 Time Complexity of the Quantum Algorithm 


18.10.1 Producing n — 1 Linearly-Independent w,; in Poly- 
nomial Time — Argument 1 


This section contains our officially sanctioned proof that one will get the desired n — 1 
vectors, orthogonal to a, with polynomial complexity in n. It is a very straightforward 
argument that considers sampling the circuit n + 7 times. While it has several 
steps, each one is comprehensible and contains the kind of arithmetic every quantum 
computer scientist should be able to reproduce. 


Theorem. /f we randomly select m+T samples from Zom, the probability 
that these samples will contain a linearly-independent subset of m vectors 
is > 1— (1/2)". 


Notice that the constant T is independent of m, so the process of selecting the 
m independent vectors is O(m + T) = O(m), not counting any sub-algorithms or 
arithmetic we have to apply in the process (which we’ll get to). We’ll be using m = 
n—1, the dimension of a, ( = Zgn-1). 


18.10.2 Proof of Theorem Used by Argument 1 


[Notational Alert. In the proof, I’ll use vector notation, as in z € (Zs) orc € 
(Z2)""*", which presents vectors in boldface.| 


Pick m+T vectors, {Z, Z1, Zo, --- Zm4+r-1 }, at random from (Z».)”. 


512 


e Step 1: Form a matrix and look at the column vectors. We begin by 
stacking the m+ T vectors atop one another, forming the matrix, 


Zo 200 201 20(m—1) 

Z1 “10 Al 21(m-1) 

Z9 = £20 221 £2(m-—1) 
Zm+T-1 &(m+T-1)0  *(m4+T-1)1 °°") %(m+T-1)(m—1) 


The number of independent rows is the row rank of this matrix, and by elemen- 
tary linear algebra, the row rank = column rank. So, let’s change our perspective 
and think of this matrix as set of m column vectors, each of dimension m + T’. 
We would be done if we could show that all m of column vectors 


£00 201 £0(m—1) 

“10 All Z1(m—1) 

220 5 221 sd aetng £2(m-1) 
&(m+T-1)0 &(m+T-1)1 &(m+T—1)(m—1) 

= Co, Ci , aia Cm-1 


were linearly independent with probability > 1 — (1/2)7*!. (That would mean 
the column rank was m.) 


[This row rank = column rank trick has other applications in quantum comput- 
ing, and, in particular, will be used when we study Neumark’s construction for 
“orthogonalizing” a set of general measurements in the next course. Neumark’s 
construction is a conceptual first step towards noisy-system analysis. | 


e Step 2: Express the probability that all m column vectors, c,, are 
independent as a product of m conditional probabilities. 


[Note: This is why we switched to column rank. The row vectors, we know, are 
not linearly independent after taking T’+ m > m samples, but we can sample 
many more than m samples and continue to look at the increasingly longer 
column vectors, eventually making those linearly independent.| 


Let 
I (j) = event that co, c;, ... €;-1 are linearly-independent. 
Our goal is to compute the probability of 4(m). Combining the basic identity, 
P( 4(m)) = P( 4(m) A F(m—-1)) 
+ P(%(m) A 7 4%(m—-1)), 


513 


with the observation that 
P( %(m) A7a¥%(m-1)) = 0 
we can write 


P(.#(m)) = P(¥(m) A ¥(m—1)) 
= P(.¥%(m)|.4(m—1)) P(.%(m-1)), 


the last equality from Bayes’ rule. 


Now, apply that same process to the right-most factor, successively, and you 
end up with 


P( .¥(m) ) 
= P( ¥%(m)| %(m—-1) ) P( ¥(m—1)| F¥(m— 2) ) 
PC?) | SOjP ew) 


= []?( 40) | 40-0). 


(The 7 = 1 term might look a little strange, because it refers to the undefined 
Y(0), but it makes sense if we view the event -%(0), i.e., that in a set of no 
vectors they’re all linearly-independent, as vacuously true. Therefore, we see 
that .7(0) can be said to have probability 1, so the 7 = 1 term reduces to 
P(.¥(1) ), without a conditional, in agreement with the line above.) 


Step 3: Compute the jth factor, P ( FJ (j) | I (j —1) iF in this prod- 
uct. 


P(4G)| ¥G-1)) = 1 - P(>(46)| 46-0) } 
But what does “4 (.4(j) | I(j —1))” mean? It means two things: 


1. It assumes the first 7 — 1 vectors, {co, ..., ¢C;-2}, are independent and 
therefore span the largest space possible for 7 — 1 vectors, namely, one of 
size 2J—!, and 


2. The jth vector, c;_1, is € the span of the first j —1 vectors {co, ..., ¢;~2}. 


This probability is computed by counting the number of ways we can select a 
vector from the entire space of 2”** vectors (remember, our column vectors 
have m+ T coordinates) that also happen to be in the span of the first 7 — 1 
vectors (a subspace of size 2~1), 


[Why size 27-1? Express the first 7 — 1 column vectors in their own coordi- 
nates, i.e., in a basis that starts with them, then adds more basis vectors to get 
the full basis for (Z2)""*". Recall that basis vectors expanded along themselves 


514 


always have a single 1 coordinate sitting in a column of 0s. Looked at this way, 
how many distinct vectors can be formed out of various sums of the original 
po7 


The probability we seek is, by definition of probability, just the ratio 


Qi-1 1 m+T—j+1 
Qm+T = (5) 2 


P(F()|¥G-1)) = 1 - ays 


fOr fd Dente 


sO 


Sanity Check. Now is a good time for the computer scientist’s first line of 
defense when coming across a messy formula: the sanity check. 


Does this make sense for 7 = 1, the case of a single vector cg? The formula tells 
us that the chances of getting a single, linearly-independent vector is 


PCA) = 1S Ca = 1- Cm 


Wait, shouldn’t the first vector be 100% certain? No, we might get unlucky and 
pick the O-vector with probability 1/2”*7, which is exactly what the formula 
predicts. 


That was too easy, so let’s do one more. What about 7 = 2? The first vector, 
Co, Spans a space of two vectors (as do all non-zero single vectors in (Z2)"). 
The chances of picking a second vector from this set would be 2/(size of the 
space), which is 2/2™*7 = 1/(2™*?—!). The formula predicts that we will get 
a second independent vector, not in the span of cg with probability 


PAD). |e (th Sy ly eS Gy = 1- iy 


exactly the complement of the probability that c; just happening to get pulled 
from the set of two vectors spanned by cg, as computed. 


So, spot testing supports the correctness of the derived formula. 

Step 4: Plug the expression for the jth factor (step 3) back into the 
full probability formula for all m vectors (step 2). 

In step 2, we decomposed the probability of “success”, as a product, 


m 


P(.¥(m)) = []P(-%G)| %G-1)). 


j=l 


In step 3, we computed the value for each term, 


P(F()|FG-)) = 1 - aes 


Combining the two, we get 


Pea), = (: = Gyr") 


j= 


which can easily be re-indexed into the form 


guy < (: é (3)"). 


[Exercise. Prove the last assertion about re-indexing.| 


e Step 5: We prove a cute mathematical lemma. Let’s take a pleasant 
off-trail walk to prove a needed fact. 


Lemma. 


then [[G-a) > 1l- Soa. 


Proof by Induction. 
— Case p= 1: 
1- Qy > 1- Q1- v 


— Consider any p > 1 and assume the claim is true for p— 1. Then 


p-1 


[[G-a) = G@-a)[[G-a) 


i=l 4=1. 


p-1 
> (l-a,) (1 - a) 
: ee 
= 1 Soa, op+ Saye 
> RIE ae ¥ QED 


[Exercise. Where did we use the hypothesis a; > 0? Same question, a; < 1.] 
[Exercise. Is the stronger hypothesis ‘>a; > 0 needed for this lemma? Why 
or why not? 


516 


e Step 6: Apply the lemma to the conclusion of step 4 to finish off the 
proof. Using m for the p of the lemma, we obtain 


P( Fn) = H(: - (3) ") 


V 

— 

| 
Ms 
7 
Nol re 
NN 
y 
t 


I| 

—_ 

| 
Goa 
Nl] rR 
NE 
y 
--.— - 7 
Ls 
, aN 
Nl rR 
Sy 
a 


But that big bracketed sum on the RHS is a bunch of distinct and positive 
powers of 1/2, which can never add up to more than 1 (think binary floating 
point numbers like .101011 or .00111 or .1111111), so that sum is < 1, ie., 


EQ] < mmm 
= E@] > 1- @ 


Combining the results of the last two equation blocks we conclude 


This proves that the column vectors, c;, are linearly-independent with proba- 
bility greater than 1 — 1/2", and therefore the row vectors, our z, also have 
at least m linearly independent vectors among them (row rank = column rank, 
remember’), with that same lower-bound probability. | QED 


18.10.83 Summary of Argument 1 


We have demonstrated that by sampling m+ T vectors in a Zp space of dimension 
m, we can be confident that the sample set contains m linearly independent vectors 
with probability 1 — (1/2)7, T independent of m. 

In Simon’s problem, we were looking for n — 1 vectors that spanned the n — 1- 
dimensional a,. By applying the above result to m =n — 1, we see that we can find 
those n—1 vectors with arbitrarily high probability by running our circuit O(T +n — 
1) = O(n) times. 


ole 


18.10.4 Producing n — 1 Linearly-Independent w; in Poly- 
nomial Time — Argument 2 


This argument is seen frequently and is more straightforward than our preferred one, 
but it gives a weaker result in absolute terms. That is, it gives an unjustifiably 
conservative projection for the number of samples required to achieve n — 1 linearly 
independent vectors. Of course, this would not affect the performance of an actual 
algorithm, since all we are doing in these proofs is showing that we'll get linear 
independence fast. An actual quantum circuit would be indifferent to how quickly 
we think it should give us n — 1 independent vectors; it would reveal them in a time 
frame set by the laws of nature, not what we proved or didn’t prove. Still, it’s nice to 
predict the convergence to linear independence accurately, which this version doesn’t 
do quite as well as the first. Due to its simplicity and prevalence in the literature, I 
include it. 


Here are the steps. We’ll refer back to the first proof when we need a result that 
was already proven there. 


Theorem. /[f we randomly select m samples from Zam, the probability that 
we have selected a complete, linearly-independent (and therefore basis) set 
is > 1/4. 


This result (once proved) estimates a probability of at least 1 — 1/4? of getting 
a linearly independent set of vectors after sampling Zan mT’ times. The reason we 
have to take take the product, mT’, is that the theorem only computes the probability 
that results when we take exactly m samples; it does not address the trickier math 
for overlapping sample sets or a slowly changing sample set that would come from 
adding one new sample and throwing away an old one. Nevertheless, it proves O(m) 
complexity. 


Keep in mind that we’ll be applying this theorem to m = n — 1, the dimension of 
aL ( = Zign-1). 
18.10.5 Proof of Theorem Used by Argument 2 


[Notation. As with the first proof, we’ll use boldface vector notation, z € (Z2)”.| 


Pick m vectors, { Zo, Z1, Z2, --- Zm—1 }, at random from (Z2)”. 


e Step 1: Express the probability that the m vectors, z;,, are indepen- 
dent as a product of m conditional probabilities. 


Let 
IS (j) = event that zo, 21, ... Z;-1 are linearly-independent. 


Our goal is to compute the probability of .4%(m). 


518 


Using the exact argument from argument 1, step 2, we conclude, 


m 


P(.¥(m)) = []P(-%G)| %G-1)). 


j=l 


(If interested, see argument 1, step 2 to account for the fact that .7(0) 
can be said to have probability 1, implying that the 7 = 1 term reduces to 
P(.A(1) ), without a conditional.) 


Step 2: Compute the jth factor, P ( I (J) | I(j —1) as in this prod- 
uct. 

P(¥)| FG-)) = 1 - P(>(40)| 4G-D) ) 
But what does “4 (.4(j) | I(j —1))” mean? It means two things: 


1. It assumes the first 7 — 1 vectors, {zo, ..., Zj;-2}, are independent and 
therefore span the largest space possible for 7 — 1 vectors, namely, one of 
size 2J—-!, and 


2. The jth vector, z;_1, is € the span of the first j —1 vectors {zo, ..., Z;—2}. 


This probability is computed by counting the number of ways we can select a 
vector from the entire space of 2” vectors which happens to also be in the span 
of the first 7 — 1 vectors, a subspace of size 2/-'. But that’s just the ratio 


Qi-1 1 m—jtl 
=." 


P( 4(7)| 4G-1)) = 1 - Gy 


Lore) Se Bsa ov i. 


sO 


Sanity Check. Does this make sense for 7 = 1, the case of a single vector Zo? 
The formula tells us that the chances of getting a single, linearly-independent 


vector is 
PCEA))  S! des G\ = = (5). 


Wait, shouldn’t the first vector be 100% certain? No, we might get unlucky 
and pick the 0-vector with probability 1/2, which is exactly what the formula 
predicts. 


That was too easy, so let’s do one more. What about 7 = 2? The first vector, 
Zo, Spans a space of two vectors (as do all non-zero single vectors in (Z2)"). 
The chances of picking a second vector from this set would be 2/(size of the 


519 


space), which is 2/2™ = 1/(2™~'). The formula predicts that we will get a 
second independent vector, not in the span of Z) with probability 


P( 4(2)| 7()) = 1 - Gy = 1- G\ 


exactly the complement of the probability that z, just happening to get pulled 
from the set of two vectors spanned by Zo, as computed. 


So, spot testing supports the formula’s claim. 


Step 3: Plug the expression for the jth factor (step 2) back into the 
full probability formula for all m vectors (step 1). 


In step 1, we decomposed the probability of “success”, as a product, 
P(.¥(m)) = []P(-%G)|%G-1)). 
j=l 


In step 2, we computed the value for each term, 


1 m—jtl 
P(¥@|FU-D) = 1- (G) 
Combining the two, we get 
m 1 m—jtl 
P(%m)) = I(: 7 (=) ) 
j=l 
which can easily be re-indexed into the form 


Peso) = i(: : (3). 


a 


[Exercise. Prove the last assertion about re-indexing.| 


Step 4: Apply a result from q-Series to finish off the proof. 


The expression we have can be multiplied by values < 1 if we are interested in a 
lower bound (which we are). Therefore, we can include all the factors for i > m 


without harm: 
1\! 
2 
1\' 
2 
From here one could just quote a result from the theory of mathematical q-series, 
namely that this infinite product is about .28879. As an alternative, there are 


° 


1—- 
i 


IV 


PLA in). = II ( 
I] 


| 


520 


some elementary proofs that involve taking the natural log of the product, 
splitting it into a finite sum plus and infinite error sum, then estimating the 
error. We’ll accept the result without further ado, which implies 


nan = f(- ) 


S20 


This proves that the m vectors, z;, are linearly-independent with probability 
> ly. 


18.10.6 Summary of Argument 2 


We have demonstrated that a random sample of exactly m vectors in a Zs space of 
dimension m will be linearly independent with probability at least 1/4. QED 


Corollary. After T full cycles, each cycle pulling m random samples from Zgm 
(not reusing vectors from previous cycles), the probability that none of the T’ sets of 
m-vectors is linearly independent is < (3/4)7. 


Proof. This follows immediately from the theorem since the chances of failure 
for selecting one set of m independent vectors = 1 - chance of success < 3/4, as the 
theorem showed. Each cycle is independent so the probability that all of them fail is 
the product of the probabilities that each one fails, and therefore < (3/4)?. QED 


In Simon’s problem, we were looking for n — 1 vectors that spanned the n — 1- 
dimensional a,. By applying the above result to m = n — 1, we see that we can find 
those n — 1 vectors with arbitrarily high probability by running our circuit O(T'(n — 
1)) = O(n) times. 


18.10.7 Discussion of the Two Proofs’ Complexity Estimates 


Both proofs give the correct O(n) complexity (of the algorithm’s quantum processing) 
for finding n — 1 independent vectors spanning a, in the context of Zan. 


The second proof is appealing in its simplicity, but part of that simplicity is due 
to its handing off a key result to the number theorists. (I have found no sources that 
give all the details of the > 1/4 step). Also, the number of circuit samples argument 
2 requires for a given level of confidence is, while still O(n), many times larger than 
one really needs, and, in particular, many more than the first proof. This is because 
it does not provide probabilities for overlapping sample sets, but rather tosses out 
all n — 1 samples of any set that fails. One can adjust the argument to account for 
this conditional dependence, but that’s a different argument; if we know that one 
set is not linearly independent, then reusing its vectors requires trickier math than 
this proof covers. Of course, this is mainly because it is only meant to be a proof of 
polynomial time complexity, and not a blueprint for implementation. 


521 


For example, for n = 10, 7’ = 10, the second proof predicts that 10 x 9 = 90 
samples would produce at least one of the 10 sets to be linearly independent with 
probability greater than 1—(3/4)!° ~ .943686. In contrast, the first proof would only 
ask for 9 + 10 = 19 samples to get confidence > .999023. That’s greater confidence 
with fewer samples. 


18.11 The Hidden Classical Algorithms and Their 
Cost 


18.11.1 Unaccounted for Steps 


As far as we have come, we can’t claim victory yet, especially after having set the 
rather high bar of proving, not just uttering, the key facts that lead to our end result. 


We have a quantum circuit with O(n) gates which we activate O(n) times to find 
the unknown period, a, with arbitrarily high probability. We’ve agreed to consider 
the time needed to operate that portion of the quantum algorithm and ignore the 
circuit’s linear growth. Therefore, we have thus far accounted for a while loop which 
requires O(n) passes to get a. 


However, there are steps in the sampling process where we have implicitly used 
some non-trivial classical algorithms that our classical computers must execute. We 
need to see where they fit into the big picture and incorporate their costs. There are 
two general areas: 


1. The test for mod-2 linear independence in (Zy)" that our iterative process has 
used throughout. 


2. The cost of solving the system of n mod-2 equations, 


0, kK=Q, ..0, 22 
1, k=n-1 


For those of you who will be skipping the details in the next sections, I'll reveal 
the results now: 


1. The test for mod-2 linear independence is handled using mod-2 Gaussian elim- 
ination which we will show to be O(n). 


2. Solving the system of n mod-2 equations is handled using back substitution 
which we will show to be O(n’). 


3. The two classical algorithms will be applied in series, so we only need to count 
the larger of the two, O(n). Together, they are applied once for each quantum 
sample, already computed to be O(n), resulting in a nested count of O(n*). 


522 


4. We'll tweak the classical tools by integrating them into Simon’s algorithm so 
that their combined cost is only O(n”), resulting in a final count of O(n?). 


Conclusion. Our implementation of Simon’s algorithm has a growth rate of 
O(n). It is polynomial fast. 


18.12 Solving Systems of Mod-2 Equations 


18.12.1 Gaussian Elimination and Back Substitution 


Conveniently, both remaining classical tasks are addressed by an age old and well 
documented technique in linear algebra for solving systems of linear equations called 
“Gaussian elimination with back substitution.” It’s a mouthful but easy to learn and, 
as with all the ancillary math we have been forced to cover, applicable throughout 
engineering. In short, a system of linear equations 


5x y 22 Oe = t 
Z- Yy Zz 2w = 10 
a+ 2y—-—- 32 + fw = -83 


is symbolized by the matrix of its constant coefficients, as in 


ee ae le 7 

Lt a a = D9 

Oo ae po 8 
W 


As the example shows, there’s no requirement that the system have the same number 
of equations as unknowns; the fewer equations, the less you will know about the solu- 
tions. (Instead of the solution being a unique vector like (x, y, z, w)! = (3,0, —7, 2)’, 
it might be a relation between the components, like (a, 4a, —.5a, 3a )', with a free 
to roam over R). Nevertheless, we can apply our techniques to any sized system. 


We break it into the two parts, 


e Gaussian elimination, which produces a matrix with Os in the lower left triangle, 
and 


e back substitution, which uses that matrix to solve the system of equations as 
best we can, meaning that if there are not enough equations, we might only get 
relations between unknowns, rather than unique numbers. 


523 


18.12.2 Gaussian Elimination 
Gaussian Elimination for Decimal Matrices 


Gaussian Elimination seeks to change the matrix for the equation into (depending 
on who you read), either 


e row echelon form, in which everything below the “diagonal” is 0, as in 
33 0 5 
Oi 225° olf co. OF 
005 -7 


e reduced row echelon form, which is echelon form with the additional re- 
quirement that the first non-zero element in each row be 1, e.g., 


it. Oe 2573 
Of 13. 28 
0.0: 1 22775 


In our case, where all the values are integers mod-2 (just 0 and 1), the two are actually 
equivalent: all non-zero values are 1, automatically. 


Properties of Echelon Forms 


Reduced or not, row echelon forms have some important properties that we will need. 
Let’s first list them, then have a peek at how one uses Gaussian elimination (GE), to 
convert any matrix to an echelon form. 


e Geometrically, the diagonal, under which all the elements must be 0, is clear in 
a square matrix: 


@ * Kk tee x 
Oe x * 
0 0 e * 
0 0 - 0 e 


When the the matrix is not square, the diagonal is geometrically visualized 


524 


relative to the upper left (position (0, 0)): 


oo 
So @ 


3 
oS 


0 0 


0 


xk 


oS 


0 


e x * 
0 e * 
0 0 * 
00 .- 0 e 


In any case, a diagonal element is one that sits on position (k,k), for some k. 


e The first non-zero element on row k is to the right of the first non-zero element 
of row k—1, but it might be two or more positions to the right. Tl use a reduced 
form which has the special value, 1, occupying the first non-zero element in a 
row to demonstrate this. Note the extra 0s, underlined, that come about as a 


result of some row being “ 


oo CO Fe 


Deere 


oOo Ff * 


o|o * * 


meee 


o|Oo x x* 


Deere 


oOorx* * 


pushed to the right” in this way. 


me x* * * 
x* * * * 


Deere 


(0/1/4) 


e Any all-zero rows necessarily appear at the bottom of the matrix. 


[Exercise. Show this follows from the definition.] 


e All non-zero row vectors in the matrix are, collectively, a linear independent 


set. 


[Exercise. Prove it.| 


e If we know that there are no all-zero row vectors in the echelon form, then the 


number of rows < number of columns. 


[Exercise. Prove it.] 


Including the RHS Constant Vector in the Gaussian Elimination Process 


When using GE to solve systems of equations, we have to be careful that the equations 
that the reduced echelon form represents are equivalent to the original equations, and 
to that end we have to modify the RHS column vector, e.g., (7, 10, —3)' of our 
example, as we act on the matrix on the LHS. We thus start the festivities by placing 


525 


the RHS constant vector in the same “house” as the LHS matrix, but in a “room” of 
its own, 


pei cd:: © ULI, sé 
det al 2a.cl0 
| es 2 all 


We will modify both the matrix and the vector at the same time, eventually resulting 
in the row-echelon form, 


17 
005 -7|-% 


5 13 

50, ==, [Ss 
1 2 832 
O13 3 |e 
7 17 
00 1 ~§ | 35 


The Three Operations that Produce Echelon Forms 


There are only three legal operations that we need to consider when performing GE, 


1. swapping two rows, 
2. multiplying a row by a nonzero value, and 


3. adding a multiple of one one row to another. 


[Exercise. Prove that these operations produce equations that have the identical 
solution(s) as the original equations. | 


526 


For example, 


—5x2nd row 9 1 2 1 is 
1 -1 1 2) 10 —__—_—_—— —5 5 —5 —10/} —50 
1 2 -3 7|-8 1 2-3 7 —3 


Ot 
re 
i) 
rae 
“N 


add Ist to 2nd 
rs 


aos) 
NO 
| | 
ww 
=a | 
O 
fee 
W & 


swap lst and 3rd 
rn 


1 
0 
3) 
add —5x\Ist to 3rd 1 2 —3 7 —3 
———— 0 6 —3 —9 | —43 
0 -—9 17 —34] 22 
bee fe Ee eS 
0 
0 


a es ee 
9 25 _95 |) 85 
2 2 2 


add 3x2nd to 3rd 
rn 


etc. (These particular operations may not lead to the echelon forms, above; they’re 
just illustrations of the three rules.) 


The Cost of Decimal-Based Gaussian Elimination 


GE is firmly established in the literature, so for those among you who are interested, 
I'll prescribe web search to dig up the exact sequence of operations needed to produce 
a row reduced echelon form. The simplest algorithms with no short-cuts use O(n°) 
operations, where an “operation” is either addition or multiplication, and n is the 
larger of the matrix’s two dimensions. Some special techniques can improve that, but 
it is always worse than O(n?), so we'll be satisfied with the simpler O(n?). 


To that result we must incorporate the cost of each multiplication and addition 
operation. For GE, multiplication could involve increasingly large numbers and, if 
incorporated into the full accounting, would change the complexity to slightly better 
than O (n3 (log m)’), where m is the absolute value of the largest integer involved. 
Addition is less costly and done in series with the multiplications so does not erode 
performance further. 


The Cost of Mod-2 Gaussian Elimination 


For mod-2 arithmetic, however, we can express the complexity without the extra vari- 
able, m. Our matrices consist of only Os and 1s, so the “multiplication” in operations 
2 and 3 reduce to either the identity (1 x a row) or producing a row of Os (0 x a 
row). Therefore, we ignore the multiplicative cost completely. Likewise, the addition 


527 


operation that is counted in the general GE algorithm is between matrix elements, 
each of which might be arbitrarily large. But for us, all matrix elements are either 0 
or 1, so our addition is a constant time XOR. between bits. 


All this means that in the mod-2 milieu, each of our © n? GE operations requires 
some constant-time XORs (for the additions) and constant time if-statements (to 
account for the multiplication by 0 or 1). Evidently, the a total mod-2 GE cost is 
untarnished at O(n°). 


With some fancy footwork in a special section, below, we’ll improve it to O(n”), 
but we don’t really care about the exact figure once we have it down this far. All we 
require is polynomial time to make Simon a success story. However, we are exercising 
our ability to evaluate algorithms, historical or future, so this continues to be an 
exercise worthy of our efforts. 


18.12.39 Back Substitution 
Back Substitution for Decimal Matrices 


It is easiest — and for us, enough — to explain back substitution in the case when the 
number of linearly independent equations equals the number of unknowns, n. Let’s 
say we begin with the system of equations in matrix form, 


Coo Co1 ae Co(n—1) na) bo 
C10 C11 tee C1(n—-1) Ly by 
. . . = 9 
C(n—1)0 C(n—1)1_ +++) ©(n—1)(n—1) Un-1 Dn—1 


assumed to be of maximal rank, n — all rows (or columns) are linearly independent. 
This would result is a reduced row echelon form 


1 cy C2 C3 +: Co(n—2) Co(n—1) b 
OT sepa igs Hee Chee “Gea bY 
0.008: aa eg soe oe Cys nis a 

0 O | re C3(n—2) C3(n—1) 3 
es WO. Or ete cae a EY 8 
00 0 0... 0 1 bs 


where, the Chi and 6), are not the original constants in the equation, but the ones 
obtained after applying GE. In reduced echelon form, we see that the (n — 1)st 
unknown, v,_1, can be read off immediately, as 


Ln-1 = Oct . 
From here, we “substitute back” into the (n — 2)nd equation to get, 


En—2 + b-1 C(n—2)(n—1) = B23 


528 


which can be solved for x,_2 (one equation, one unknown). Once solved, we substitute 
these numbers into the equation above, getting another equation with one unknown. 
This continues until all rows display the answer to its corresponding x;, and the 
system is solved. 


The Cost of Decimal-Based Back Substitution 


The bottom row has no operations: it is already solved. The second-from-bottom has 
one multiplication and one addition. The third-from-bottom has two multiplications 
and two additions. Continuing in this manner and adding things up, we get 


(n—1)n 


1424+34+..4+(n-1) = ; 


additions and the same number of multiplications, producing an overall complexity of 
O(n”) operations. As noted, the time complexity of the addition and multiplication 
algorithms would degrade by a factor of (log m)’, m being the largest number to be 
multiplied, making the overall “bit” complexity O (n? (log m)’). 


The Cost of Mod-2 Back Substitution 


For mod-2 systems, we have no actual multiplication (the bj, are either 1 and 0) and 
the additions are single-bit mod-2 additions and therefore, constant time. Thus, each 
of the approximately n(n + 1)/2 (= O(n?) ) addition operations in back substitution 
uses a constant time addition, leaving the total cost of back substitution un-degraded 
atOUr). 


18.12.4 The Total Cost of the Classical Techniques for Solv- 
ing Mod-2 Systems 

We have shown that Gaussian elimination and back substitution in the mod-2 envi- 

ronment have time complexities, O(n?) and O(n?), respectively. To solve a system of 


n mod-2 equation the two methods can be executed in series, so the dominant O(n?) 
will cover the entire expense. 


However, this isn’t exactly how Simon’s algorithm uses these classical tools, so we 
need to count in a way that precisely fits our needs. 


18.13 Applying GE and Back Substitution to Si- 
mon’s Problem 


We now show how these time tested techniques can be used to evaluate the classical 
post processing costs in Simon’s algorithm. The eventual answer we will get is this: 


529 


They will be used in-series, so we only need to count the larger of the two, O(n’), 
and these algorithms are applied once for each quantum sample, already computed 
to be O(n), resulting in a nested count of O(n*). 


18.13.1 Linear Independence 


Gaussian elimination — without back substitution — can be used to determine whether 
a vector z is linearly independent of a set W = {w ,} of m vectors which is, itself, 
known to be linearly independent. Therefore, we will apply GE after taking each new 
sample z, and if we find z to be independent of the existing W, we'll add it to W 
(increasing its size by one) and go on to get our next z. If z turns out to not be 
independent of W, we'll throw it away and get another z without changing W. This 
process continues until either W contains the maximum n — | vectors (the most that 
can be orthogonal to the unknown period a) or we have exceeded the n+T sampling 
limit. 

The following process demonstrates one way to apply GE to determine linearly 
independence and build an increasingly larger set of independent vectors in W. 


1. Stack the w; on top of one another to form an m x n matrix, which we assume 
to already be in reduced echelon form (our construction will guarantee it): 


1 wWo1 Wo2 Wo3 +++ Wo(n—3) — Wo(n—2) Wo(n-1) 
0 0 Le SWI vies W1(n—3) W1(n—2) W1(n-1) 
0 0 0 1 peor W2(n—3) W2(n—2) W2(n—1) 
0 0 0 0 aia 1 W(m—1)(n—2) W(m—1)(n—1) 


2. Observations. Notice that m < (n — 1), since the vectors in W are indepen- 
dent, by assumption, and if m were equal to n — 1, we would already have a 
maximally independent set of vectors known to be orthogonal to a and would 
have stopped sampling the circuit. That means that there are at least two more 
columns than there are rows: the full space is n-dimensional, and we have n — 2 
or fewer linearly independent vectors so far. 


As a consequence, one or more rows (the second, in the above example) skips 
to the right more than one position relative to the row above it and/or the final 
row in the matrix has its leading 1 in column n — 3 or greater. 


3. Put z at the bottom of the stack and re-apply GE. 


e If z is linearly independent of W this will result in a new non-zero 
vector row. Replace the set W with the new set whose coordinates are the 
rows of the new reduced-echelon matrix. These new row vectors are in the 
span of the old W plus z added. [Caution. It is possible that none of 
the original w; or z explicitly appear in the new rows, which come from 


530 


GE applied to those vectors. All that matters is that the span of the new 
rows is the same as the span of W plus z, which GE ensures.] We have 
increased our set of linearly independent vectors by one. 


e If z is not linearly independent of W the last row will contain all Os. 
Recover the original W (or, if you like, replace it with the new reduced 
matrix row vectors, leaving off the final 0 row — the two sets will span the 
same space and be linearly independent). You are ready to grab another z 
based on the outer-loop inside which this linear-independence test resides. 


[Exercise. Explain why all the claims in this step are true.| 


4. Once n — 1 vectors populate the set W = {wo, ..., Wn—2}, the process is 
complete. We call the associated row-reduced matrix W, 


where a is our unknown period and 0 is the (n — 1)-dimensional 0-vector in 
(a This condenses our system of n— 1 equations, a-w, = 0, into a single 
matrix relation. 


Cost of Using GE to Determine Linear Independence 


In summary, we have employed the classical algorithm GE to handle our linear- 
independence test. GE is O(n?) and is applied once for each pass of our quantum 
circuit. Since that circuit was O(n), the new total cost (so far) for Simon’s algorithm 
is O(n“). (We'll do better.) 


18.13.2 Completing the Basis with an nth Vector Not Or- 
thogonal to a 


Once we are finished assembling the set of n — 1 vectors, W = {wo, ...Wn—2}, we 
will add a final nth vector, wn_1, to the set — a vector which is not orthogonal to a 
(remember, a might not be such a vector — it will be orthogonal to itself exactly half 
the time in our strange mod-2 dot product). wy; can be quickly determined and 
inserted into the correct position of the reduced W (~ W) matrix as follows. 


1. Starting from the top row, wo, look for the last (lowest) w, which has its leading 
1 in column & (i.e., it has its leading 1 on W’s diagonal, but w;41, directly below 


it, has a 0 in its (k + 1)st position). 


531 


e If such a wy, can be found, define 


(Oe Oe sess 0e Once 50) 


= gn—2-k Ff 


Wn-1 
k; + 1st position from left 
and place this new w,_; directly below w;, pushing all the vectors in the 


old rows k + 1 and greater down to accommodate the insertion. Call the 
augmented matrix, W’. 


Before: 
1 wor Woz Woz Wos Wos 
O 1 wi wig Wi4 W115 
W = 0 0 1 W223 W24 Wa5 <— Wk 
0 0 0 0 1 W35 
0 O 0 0 0 1 
After: 
1 Wor Wo2 Wo3 Wor Wos 
O 1 wi Wig Wi4 W115 
WwW = 0.0 1 W23 Waa W95 
0 O 0 1 0 0 — Wn-1 
0 0 0 0 1 W35 
0 O 0 0 0 1 


e If there is no such wz, then either 


— W has all Is on its diagonal, or 
— W has all Os on its diagonal. 


e If W has all 1s on its diagonal, the final row of W is of the form 
jig es) TOLD. ae ey 
Define 
Un1 = 2 = 1 +} (0,0,..., 0,0, 1), 


and place this new w,_, after wp_2 last old row, making w,_; the new 
bottom row of W. Call the augmented matrix, W’. 


Before: 


= 
l 
ocoocor 


0 0 1 was <— Wn_-2 


After: 


Wo1 Wo2 Wo3 Wo4 Wo5 


oo oooc Ff 


1 
0 
0 0 

0 0 0 1 Wa45 

0 0 0 0 1 <— itDyi 


e Exercise. If W has all Os on its diagonal, explain what to do, showing 
example matrices. 


[Exercise. Prove all outstanding claims in step 1.] 
2. Into the (n — 1)-dimensional 0 vector comprising the RHS of the equation 
W-a = O, 


insert a 1 into the position corresponding to the new row in W. Push any Os 
down, as needed, to accommodate this 1. It will now be an n-dimensional vector 
corresponding to 2° for some k. 


SO OS :-OS 
oqooorde#ce;q 


That will produce a full set of n linearly independent vectors for Zon in reduced 
echelon form, which we call W’. 


Cost of Completing the Basis 
The above process consists of a small number of O(n) operations, each occurring in 


series with each other and with the loops that come before. Therefore, its O(n) adds 
nothing to the previous complexity, which now stands at O(n*). 


533 


18.13.3. Using Back-Substitution to Close the Deal 
We are, metaphorically, 99% of the way to finding the period of f. We want to solve 


0 
0 


aS ey 


but W is already in reduced-echelon form. We need only apply mod-2 back-substitution 
to extract the solution vector, a. 


Cost of Back Substitution to the Algorithm 


This is an O(n?) activity done in series with the loops that come before. Therefore, 
its O(n”) adds nothing to the previous complexity, which still stands at O(n‘). 


18.13.4 The Full Cost of the Hidden Classical Algorithms 


We have accounted for the classical cost of testing the linear independence and solving 
the system of equations. In the process, we have demonstrated that it increases the 
complexity by a factor n?, making the full algorithm € O(n“), not counting the oracle, 
whose complexity is unknown to us. 


We will do better, though, by leveraging mod-2 shortcuts that are integrated into 
Simon’s algorithm. You'll see. 


The melody that keeps repeating in our head, though, is the footnote that this 
entire analysis is relative to the quantum oracle U,, the reversible operator associated 
with the Zan periodic function, f. Its complexity is that of the black box for f, itself. 
We do not generally know that complexity, and it may well be very bad. Fortunately, 
in some special cases of great interest, we know enough about the function to be able 
to state that it has polynomial time complexity, often of a low polynomial order. 


18.14 Adjusted Algorithm 


18.14.1 New Linear Independence Step 


We now integrate Gaussian elimination into Simon’s algorithm during the test for 
linear independence. Here is the step, as originally presented: 


534 


3. Use a classical algorithm to determine whether or not z is linearly dependent 
on the vectors in W. 


e If it is independent, name it w;, where 7 is the number of elements already 
stored in W, and add w; to W. 


— if 7 = n—2, we have n — 1 linearly-independent vectors in W and 
are done; break the loop. 


e If it is not independent (which includes the special case z = 0, even when 
W is still empty), then continue to the next pass of the loop. 


The entire block can be replaced by the following, which not only tests for linear 
independence, but keeps W’s associated matrix, W, in reduced echelon form and does 
so leveraging the special simplicity afforded by mod-2 arithmetic. 


The algorithm is more readable if we carry an example along. Assume that, to 
date, W is represented by 


ooor 
Sore. 
ooor 
SCOrE 
oaooroeo 
a ee et 
FOrFO 
oaornoe 
FPOoOOH 


and we just pulled 


from our circuit. 
New Step 3 


3. Loop (new inner loop) until either z has been added to W(~ W) 
or z = 0 produced. 


(a) m< most significant bit (MSB) position of z that contains a 1. 
(O(n)) 
[In our example, m = bit #5 (don’t forget that we count from the right, 
not the left)| 


(b) Search W for a w, row with the same non-0 MSB position, m. 
(O(n) 
[In our example, wy is the (only) row with an MSB of 1 4 0 in bit #5] 


e If a row vector w;, with same non-0 MSB is not found, insert 
z between the two rows of W that will guarantee the preser- 
vation of its reduced echelon form, effectively adding it to W. 
End this loop. 

[In our example, this case does not occur.| 


535 


e If a row vector wy, with same non-0 MSB is found replace 
z+ z@we;, 


effectively replacing its old MSB with a 0. Continue to next 
pass of this loop. (O(n) for @) 
[In our example, 


z & z2@w, = (000001101) 


[Exercise. Finish this example, repeating the loop as many times as needed, and 
decide whether the original z produces an augmented W or turns z into 0.| 


[Exercise. Try the loop with z = 000001100.| 
[Exercise. Try the loop with z = 010001100.| 


Why this Works 


In the proposed loop, we are considering whether or not to add some z to our set, W. 


e If we do not find a wz whose MSB matches the MSB of z, then adding z to 
W in the prescribed fashion produces an additional row for our matrix, while 
preserving its echelon form. Therefore, the rows are still linearly independent. 
(This was an exercise from the Gaussian elimination section.) We have in- 
creased W by one vector, and thereby broadened its span to a subspace with 
twice as many vectors, all are orthogonal to a. 


[Exercise. Fill in the details. ] 


e Say we do find a w; whose MSB matches the MSB of z. z@w, is, by definition, 
in the span of WU{z}. Since z is orthogonal to a, so, too is the new z = z@uwr, 
but the new z has more Os, bringing us one loop pass closer to either adding it W 
(the above case) or arriving at a termination condition in which we ultimately 
produce z = 0, which is never independent of any set, so neither was the previous 
zs that got us to this point (including the original z produced by the circuit.) 


[Exercise. Fill in the details, making sure to explain why z and z @ wx are 
either a) both linearly independent of W or b) both not.|] 


Cost of New Step 3 


This step has a new inner-loop, relative to the outer quantum sampling loop, which 
has, at most, n — 1 passes (we move the MSB to the right each time). Inside the loop 
we apply some O(n) operations in series with each other. Therefore, the total cost of 
the Step 3 loop is O(n”), which includes all arithmetic. 


536 


Summary 


We are not applying GE all at once to a single matrix. Rather, we are doing an O(n?) 
operation after each quantum sample that keeps the accumulated W set in eternal 
echelon-reduced form. So, it’s a custom O(n?) algorithm nested within the quantum 
O(n) loop, giving an outer complexity of O(n°). 


18.14.2 New Solution of System Step 


Finally, we integrate back-substitution into the final step of the original algorithm. 
The original algorithm’s final step was 
e Otherwise, we succeeded. Add an nth vector, wy_1, which is linearly indepen- 
dent to this set (and therefore not orthogonal to a, by a previous exercise), done 


easily using a simple classical observation, demonstrated below. This produces 
a system of n independent equations satisfying 


0, k= 0) 6225. R= 2 
WRr° QA = 
le £ea=1 
which has a unique non-zero solution. 


Replace this with the new final step, 


e Otherwise, we succeeded and W is an (n—1) xn matrix in reduced echelon form. 
Add an nth row vector, w,-1, which is linearly independent to W’s rows (and 
therefore not orthogonal to a), using the process described in Solving the Final 
Set of Linear Equations, above. That was an O(n) process that produced an 
nxn W, also in reduced echelon form. We now have a system of n independent 
equations satisfying 


Oe, Ara... gg eS 
Wr° a = 
1, k=n-1 
which is already in reduced echelon form. Solve it using only back-substitution, 


which is O(n”). 


The take-away is that we have already produced the echelon form as part of the 
linear-independence tests, so we are positioned to solve the system using only back- 
substitution, O(n?). 


18.14.3 Cost of Adjusted Implementation of Simon’s Algo- 
rithm 
We detailed two adjustments to the original algorithm. The first was the test for 


linear independence using a mod-2 step that simulateously resulted in GE at the end 
all the quantum sampling. The second was the solution of the system of equations. 


537 


Cost of the mod-2 Test for Independence 


As already noted, the step-3 loop is repeated at most n — 1 times, since we push 
the MSB of z to the right at least one position each pass. Inside this loop, we have 
O(n) operations, all applied in series. The nesting in step 3, therefore, produces an 
O(n?) complexity. Step 3 is within the outer quantum loop, O(n), which brings the 
outer-most complexity to O(n°), so far... . 


Cost of Solving the System 


We saw that we could solve the system using only back-substitution, O(n). This is 
done outside the entire quantum loop. 


The total cost, quantum + classical, using our fancy footwork is, therefore, O(n?), 
relativized to the oracle. 


18.15 Classical Complexity of Simon’s Problem 


Classically, this problem is hard, that is, deterministically, we certainly need a 
number of trials that increases exponentially in n to get the period, and even if we 
are satisfied with a small error, we would stzll need to take an exponential number 
of samples to achieve that (and not just any exponential number of samples, but a 
really big one). Let’s demonstrate all this. 


18.15.1 Classical Deterministic Cost 


Recall that the domain can be partitioned (in more than one way) into two disjoint 
sets, R and Q, 
Zign — R U 
— {-++, @, +++} U {---,@@a,---}, 
with f one-to-one on R and Q, individually. We pick xs at random (avoiding dupli- 
cates) and plug each one into f — or a classical oracle of f if you like, 


x ——{Classical f H— f(z). 


If we sample f any fewer than (half the domain size) + 1, that is, (2"/2)+1 = 
2”-1 4 1, times, we may be unlucky enough that all of our outputs, f(x), are images 
of x € R (or all € Q), which would produce all distinct functional values. There is 
no way to determine what a is if we don’t get a duplicate output, f(a’) = f(a”) for 
some distinct 2’, 7” € dom(f). (Once we do, of course, a = 2’ 6 x”, but until that 
time, no dice.) 


Therefore, we have to sample 2"-! + 1 times, exponential in n, to be sure we get 
at least one x from R and one from Q, thus producing a duplicate. 


538 


18.15.2 Classical Probabilistic Cost 


However, what if we were satisfied to find a with a pre-determined probability of 
error, ¢? This means that we could choose an € as small as we like and know that if 
we took m = m(e,n) samples, we would get the right answer, a, with probability 


P( figure out a, correctly) > 1 — e«. 


The functional dependence m = m(é,n) is just a way to express that we are allowed 
to let the number of samples, m, depend on both how small an error we want and 
also how big the domain of f is. For example, if we could show that m = 2!/© worked, 
then since that is not dependent on n the complexity would be constant time. On the 
other hand, if we could only prove that an m = n*2'/* worked, then the algorithm 
would be O(n*). 


The function dependence we care about does not involve ¢, only n, so really, we 
are interested in m = m(n). 


What we will show is that even if we let m be a particular function that grows 
exponentially in n, we won’t succeed. That’s not to say every exponentially increasing 
sample size which is a function of n would fail — we already know that if we chose 
2”-' + 1 we will succeed with certainty. But we’ll see that some smaller exponential 
function of n, specifically m = 2”/4, will not work, and if that won’t work then no 
polynomial dependence on n, which necessarily grows more slowly than m = 2”/4, 
has a chance, either. 


An Upper Bound for Getting Repeats in m Samples 


For the moment, we won’t concern ourselves with whether or not m is some function 
of ¢ or n. Instead, let’s compute an upper bound on the probability of getting a 
repeat when sampling a classical oracle m times. That is, we’ll get the probability 
as a function of m, alone. Afterwards, we can stand back and see what kind of 
dependence m would require on n in order that the deck be stacked in our favor. 

We are looking for the probability that at least two samples f(«;), f(a;) are equal 
when choosing m distinct inputs, {%, 21, ..., Um—i}. We’ll call that event &. The 
more specific event that some pair of inputs, x; and x;, yield equal f(x)s, will be 
referred to as &;. Since &; and &;; are the same event, we only have to list it once, 
so we only consider the cases where 2 < j. Clearly, 


The probability of a union is the sum of the probabilities of the individual events 


539 


minus the probabilities of the various intersections of the individual events, 


P(é) = Ss P(é) — So P(E Aba...) 


i, 7=0 various 
i<j combinations 


SP) 


i, j=0 
t<J 


IA 


The number of unordered pairs, {7,7}, 7 4 7, when taken from m things is (look up 
“n choose k” if you have never seen this) 


This is exactly the number of events &;; that we are counting since our condition, 
0<i<j<m-1, is in 1-to-1 correspondence with the set of unordered pairs {i, 7}, 
i and j between 0 and m — 1, inclusive and i ¥ 7. 


Meanwhile, the probability that an individual pair produces the same f value is 
just the probability that we choose the second one, x;, in such a way that it happens 
to be exactly x; a. Since we’re intentionally not going to pick x; a second time this 
leaves 2” — 1 choices, of which only one is x7; @ a, so that gives 

1 
Qr—1- 
Therefore, we’ve computed the number of elements in the sum, m(m — 1)/2, as well 
as the probability of each element in the sum, 1/(2” — 1), so we plug back into our 
inequality to get 


P(&;) = 


m(m— 1) 1 


P(@é) < : : 
(@) Ss 2 20h 


[Exercise. We know that when we sample m = 2"~! +1 times, we are certain to 
get a duplicate. As a sanity check, make sure that plugging this value of m into the 
derived inequality gives an upper bound that is no less than one. Any value > 1 will 
be consistent with our result. | 


This is the first formula we sought. We now go on to see what this implies about 
how m would need to depend on n to give a decent chance of obtaining a. 


What the Estimate Tells Us about m = m(n) 


To get our feet wet, let’s imagine the pipe dream that we can use an m that is 
independent of n. The bound we proved, 
—1 1 
pie). @ Ce 
2 27 —1 
1000000 ) 


tells us that any such m (say an integer m > 1/e is going to have an exponen- 
tially small probability as n — oo. So that settles that, at least. 


540 


An Upper Bound for Getting Repeats in m = 2”/4 Samples 
But, what about allowing m to be a really large proportion of our domain, like 
maz? 


That’s an exponential function of n, so it really grows fast. Now we’re at least being 
realistic in our hopes. Still, they are dashed: 


m(m—1) 1 Qi? (Belt —.1.) 1 
FAG): "S D) Gea a D) on] 
2n/2 — ania 1 one 1 
~ D) as ne er 
1 gn/2 1 gn/2 
— a < te 
J -. Dives Fi Q Qn — 9n/2 
1 gni2 ii i 
~ 2 anf2(an/2— 1) ni] 
1 
Spay’ 


a probability that still approaches 0 (exponentially fast — unnecessary salt in the 
wound) as n — oo. Although we are allowing ourselves to sample the function, f, at 
m = m(n) distinct inputs, where m(n) grows exponentially in n, we cannot guarantee 
a reasonable probability of getting a repeat f(x) for all n. The probability always 
shrinks to 0 when n gets large. This is exponential time complexity. 


This problem is hard, even probabilistically, in classical terms. Simon’s algorithm 
gives a (relativized) exponential speed-up over classical methods. 


541 


Chapter 19 


Real and Complex Fourier Series 


19.1 The Classical Path to Quantum Fourier Trans- 
forms 


Our previous quantum algorithms made propitious use of the nth order Hadamard 
transform, H®", but our next algorithm will require something a little higher octane. 
The fundamental rules apply: a gate is still a gate and must be unitary. As such, it 
can be viewed as a basis change at one moment and a tool to turn a separable input 
state like |0)” into a superposition, the next. We’ll have occasion to look at it in both 
lights. 


Our objective in the next four lessons is to study the quantum Fourier transform 
a.k.a. the QF T. This is done in three classical chapters and one quantum chapter. 
Reading and studying all three classical chapters will best prepare you for the fourth 
OFT chapter, but you can cherry pick, skim, or even skip one or more of the three 
depending on your interest and prior background. 


An aggressive short cut might be to try starting with the third, the Discrete and 
Fast Fourier Transforms, read that for general comprehension, then see if it gives you 
enough of a foundation to get through the fourth OFT chapter. 


If you have time and want to learn (or review) the classic math that leads to 
OFT, the full path will take you through some beautiful topics: 


This Chapter {Real Fourier Series + Complex Fourier Series] 
Chapter 20 rere Fourier Transform] 
Chapter 21 [Discrete Fourier ene — Fast Fourier Transform] 
Chapter 22 ete Fourier Transform] 


If you do decide to invest time in the early material, you will be incidentally adding 
knowledge applicable to many fields besides quantum computing. You will also have 


542 


some additional insight into the OFT. 

In this first lesson we introduce the concepts frequency and period and show how 
a periodic function can be built using only sines and cosines (real Fourier series) or 
exponentials (complex Fourier series). The basic concepts here make learning the 
Fourier transform (next chapter) very intuitive. 


19.2 Periodic Functions and Their Friends 


Fourier Series apply to functions that are either periodic or are only defined on 
a bounded interval. We’ll first define the terms, periodicity, bounded domain and 
compact support, then we can get on with Fourier Series. 


19.2.1 Periodic Functions over R 


A function f whose domain is (nearly) all the real numbers, R, is said to be periodic 
if there is a unique smallest positive real number 7’, such that 


f@+T) = f(s), forall z. 


The number, 7’, is called the period of f. 


(“Unique smallest” is a little redundant, but it condenses two ideas. Officially, 
we'd say that there must exist some T > 0 with the stated property and, if there did, 
we would call the smallest such T (technically the GLB or greatest lower bound) its 
period. ) 

The simplest examples we can conjure up instantly are the functions sine and 
cosine. Take sine, for example: 


sin(z + 27) = sin(z), for all z, 
and 27 is the smallest positive number with this property, so a = 27 is its period. 47 
and 127 satisfy the equality, but they’re not as small as 27, so they’re not periods. 


Its graph manifests this periodicity in the form of repetition (Figure 19.1). 


I said that the domain could be “nearly” all real numbers, because it’s fine if the 
function blows-up or is undefined on some isolated points. A good example is the 
tangent function, y = tanz, whose period is half that of sinz but is undefined for 
+7/2, +37/2, etc. (Figure 19.2). 


543 


> 


Figure 19.1: Graph of the quintessential periodic function y = sin x 
(period = 27) 


Figure 19.2: The function y = tan blows-up at isolated points but is still periodic 
(with period 7) 


19.2.2 Functions with Bounded Domain or Compact Support 


A function which is defined only over a bounded interval of the real numbers (like 
(0, 100] or [—7, z]) is said to have bounded domain. An example is: 


o if x € [-1,3) 
f(z) = 


undefined, otherwise 


Figure 19.3: Graph of a function defined only for x € [—1,3) 


[Note the notation [—1,3) for the “half-open” interval that contains the left end- 
point, —1, but not the right endpoint, 3. We could have included 3 and not included 


544 


—1, included both or neither. It all depends on our particular goal and function. 
Half-open intervals, closed on the left and open on the right, are the most useful to 
us.| 

A subtly different concept is that of compact support. A function might be defined 
on a relatively large set, say all (or most) real numbers, but happen to be zero outside 
a bounded interval. In this case we prefer to say that it has compact support. 


The previous example, extended to all R, but set to zero outside |—1, 3) is 


x’, if x € [-1,3) 
f(z) = 


0, otherwise 


(See Figure 19.4) 


Terminology 


The support of the function is the closure of the domain where f 4 0. In the last 
function, although f is non-zero only for |—1,0) U (0,3), we include the two points 0 
and 3 in its support since they are part of the closure of the set where f is non-zero. 
(I realize that I have not defined closure, and I won’t do so rigorously. For us, closure 
means adding back any points which are “right next” to places where f is non-zero, 
like 0 and 3 in the last example.) 


Figure 19.4: Graph of a function defined everywhere, but whose support is [—1, 3], 
the closure of [—1,0) U (0,3) 


19.2.3. The Connection Between Periodicity and Bounded 
Domain 


If a function is periodic, with period 7’, once you know it on any half-open interval of 
length T, say [0,7) or |-T'/2, 7/2), you automatically know it for all 2, complements 
of f(z) = f(a+T). So we could restrict our attention to the interval. [-T/2, T/2) 
— imagining, if it suited us, that the function was undefined off that interval. Our 
understanding of the function on this interval would tell us everything there is to 


545 


know about the function elsewhere, since the rest of the graph of the function is just 
a repeated clone of what we see on this small part. 

Likewise, if we had a (non-periodic) function with bounded domain, say |a, b], we 
could throw away b to make it a half-open interval |a, b) (we don’t care about f at one 
point, anyway). We then convert that to an induced periodic function by insisting 
that f(x) = f(a+T), for T= b—a. This defines f everywhere off that interval, and 
the expanded function agrees with f on its original domain, but is now periodic with 
period T = b—a. 

As a result of this duality between periodic functions and functions with bounded 
domain, I will be interchanging the terms periodic and bounded domain at will over 
the next few sections, choosing whichever one best fits the context at hand. 


19.3. The Real Fourier Series of a Real Period Func- 
tion 


19.3.1 Definitions 


Figure 19.5: A periodic function that can be expressed as a Fourier series 
(period = 27) 


Figure 19.6: A function with bounded domain that can be expressed as a Fourier 
series (support width = 27) 


546 


Until further notice we confine ourselves to functions with domain C R and range 
CR. 

Any “well-behaved” function of the real numbers that is either periodic (See Fig- 
ure 19.5), or has bounded domain (See Figure 19.6), can be expressed as a sum of 
sines and cosines. This is true for any period or support width, 7’, but we normally 
simplify things by taking T’ = 27. 


The Real Fourier Series. The real Fourier series of a well-behaved 
periodic function with period 2m 1s the sum 


1 [oe] [oe] 
fiz) = do 5 + Do erese + eee 


The sum on the RHS of this equation is called the Fourier Series of the function f. 
The functions of x (that is, {sinnz},, {cosnx}, and the constant function 1/2) that 
appear in the sum are sometimes called the Fourier basis functions. 


Study this carefully for a moment. There is a constant term out front (a 9/2), 
which simply shifts the function’s graph up or down, vertically. Then, there are two 
infinite sums involving cosnx and sinnx. Each term in those sums has a coefficient 
—some real number a, or b,, — in front of it. In thirty seconds we’ll see what all that 
means. 


[The term well-behaved could take us all week to explore, but every function 
that you are likely to think of, that we will need, or that comes up in physics and 
engineering, is almost certainly going to be well-behaved. 


19.3.2 Interpretation of the Fourier Series 


Each sinusoid in the Fourier series has a certain frequency associated with it: the n 
in sinnz or cosna. The larger the n, the higher the frequency of that sine or cosine. 
(See Figure 19.7) 


Of course, not every term will have the same “amount” of that particular fre- 
quency. That’s where the coefficients in front of the sines and cosines come into 
play. The way we think about the collection of coefficients, {an,b,}, in the Fourier 
expansion is summarized in the bulleted list, below. 


e When small-n coefficients like ag, a1, 6; or by are large in magnitude, the function 
possesses significant low frequency characteristics (visible by slowly changing, 
large curvature in the graph of f). 


e When the higher-n coefficients like aso, b9q or bigg9 are large in magnitude, the 
function has lots of high frequency characteristics (busy squiggling) going on. 


e The coefficients, {a,} and {b,} are often called the weights or amplitudes of 
their respective basis functions (in front of which they stand). If |a,,| is large, 


547 


ALAAAIAUAMAAMEAAMMALEAAMAA 
HAVO TAVOTTGATOVALOTT 


> 


=—2 


Figure 19.7: A low frequency (n = 1: sina) and high frequency (n = 20 : sin 20z) 
basis function in the Fourier series 


there’s a “lot of’ cosnx needed in the recipe to prepare a meal of f(x) (same 
goes for |b,,| and sinnz). Each coefficient adds just the right about of weight of 
its corresponding sinusoid to build f. 


e As mentioned, the functions {sinnx} and {cosnx} are sometimes called the 
Fourier basis functions, at other times the normal modes, and in some contexts 
the Fourier eigenfunctions. Whatever we call them, they represent the individ- 
ual ingredients used to build the original f out of trigonometric objects, and 
the weights instruct the chef how much of each function to add to the recipe: 
“a pinch of cos 3x, a quart of sin 5x, three tablespoons of sin 17x”, etc. 


e Caution: A sharp turn (f’ blows up or there is a jump discontinuity) at even a 
single domain point is a kind of squiggliness, so the function may appear smooth 
except for one or two angled or cornered points, but those points require lots of 
high frequencies in order to be modeled by the Fourier series. 


548 


19.3.3. Example of a Fourier Series 
To bring all this into focus, we look at the Fourier series of a function that is about 
a simple as you can imagine, 

fie) = x, &€ |=, 7): 


f’s graph on the fundamental interval [—7, 7) is shown in Figure 19.8. When viewed 
as a periodic function, f will appear as is in Figure 19.9. Either way, we represent it 
as the following Fourier series 


e = 3 (—1)"*1 (=) sinnz. 


n=l 


Figure 19.9: f(a) =a as a periodic function with fundamental interval [—7, 7): 


The expression of such a simple function, x, as the infinite sum of sines should 
strike you as somewhat odd. Why do we even bother? There are times when we 
need the Fourier expansion of even a simple function like f(#) = x in order to fully 
analyze it. Or perhaps we wish to build a circuit to generate a signal like f(x) = x 
electronically in a signal generator. Circuits naturally have sinusoids at their disposal, 
constructed by squeezing, stretching and amplifying the 60 Hz signal coming from the 
AC outlet in the wall. However, they don’t have an f(x) = x signal, so that one must 
built it from sinusoids, and the Fourier series provides the blueprint for the circuit. 


Why is the Fourier series of f(x) = x so complicated? First, f, has a jump 
discontinuity — a sharp edge — so it takes a lot of high frequency components to shape 
it. If you look at the periodic sawtooth shape of f(x) = x, you'll see these sharp 
points. Second, f’s graph is linear (but not constant). Crafting a non-horizontal 
straight line from curvy sines and cosines requires considerable tinkering. 


549 


Finite Approximations 


While the Fourier sum is exact (for well-behaved f) the vagaries of hardware require 
that we merely approximate it by taking only a partial sum that ends at some finite 
n = N < oo. For our f under consideration, the first 25 coefficients of the sines are 
shown in Figure 19.10 and graphed in Figure 19.11. 


2 a 2 a: 2 a 2 1 2 
2, -i, ey, ate ae Psa ae OS Soe a el reel 
5 a” a> 3 3; sam 6 13 
i 2 i 2 i 2 1 2 1 2 1 dis 
= ’ Ye ’ a ’ nim ’ Te PS ’ 
a4 &5 8 17 5 38 10 22 il. ‘23 12 25 


Figure 19.10: First 25 Fourier coefficients of f(x) = x 


Figure 19.11: Graph of the Fourier coefficients of f(x) = x 


The Spectrum 


Collectively, the Fourier coefficients (or their graph) is called the spectrum of f. It is 
a possibly infinite list (or graph) of the weights of the various frequencies contained 


in f. 
Viewed in this way, the coefficients, themselves, represent a new function, F'(n). 


The Fourier Series as an Operator Mapping Functions to Functions 


The Fourier mechanism is a kind of operator, FS, applied to f(x) to get a new 
function, F'(n), which is also called the spectrum. 


590 


ee 


The catch is, this new function, F’, is only defined on the non-negative integers. In 
fact, if you look closely, it’s really two separate functions of integers, a(n) = a, and 
b(n) = b,. But that’s okay — we only want to get comfortable with the idea that 
the Fourier “operator” takes one function, f(z), domain R, and produces another 
function, its spectrum F(n), domain Zo. 


f: R— R 
F: Zs9 — R 
f YF 


F contains every ounce of information of the original f, only expressed in a different 
form. 


Computing Fourier Coefficients 


The way to produce the Fourier coefficients, {a,,,,,} of a function, f, is through these 
easy formulas (that I won’t derive), 


1 Tv 
iy = -{ ffi 20, 


an 


1 Tv 
-{ f(x)cosnadz, n>0O, and 
a 


1 Tv 
bn -{ f(@)snnvdr, nm>0. 
7 —T 
They work for functions which have period 27 or bounded domain |[—7,7). For some 
other period T,, we would need to multiply or divide T’/(27) in the right places (check 
on-line or see if you can derive the general formula). 

Using these formulas, you can do lots of exercises, computing the Fourier series of 
various functions restricted to the interval |—7, 7). 

In practice, we can’t build circuits or algorithms that will generate an infinite 
sum of frequencies, but it’s easy enough to stop after any finite number of terms. 
Figure 19.12 shows what we get if we stop the sum after three terms. 

It’s not very impressive, but remember, we are using only three sines/cosines to 
approximate a diagonal line. Not bad, when you think of it that way. Let’s take the 
first 50 terms and see what we get (Figure 19.13). 


Now we understand how Fourier series work. We can see the close approximation to 
the straight line near the middle of the domain and also recognize the high frequency 


dol 


-3 


Figure 19.13: Fourier partial sum of f(x) = x to n = 50 


effort to get at the sharp “teeth” at each end. Taking it out to 1000 terms, Figure 19.14 
shows a stage which we might find acceptable in some applications. 


502 


Figure 19.14: Fourier partial sum of f(x) = x to n = 1000 


Note: The Fourier series 
{(2) = : + 3 a, cosnax + 3 b, sin na 
2 n=1 n=1 


always produces a function of x which is periodic on the entire real line, even if we 
started with (and only care about) a function, f(x) with bounded domain. The RHS 
of this “equation” matches the original f over its original domain, but the domain 
of the RHS may be larger. To illustrate this, if we were modeling the function 
f(x) = 2’, restricted to [-7,7), the Fourier series would converge on the entire real 
line, R, beyond the original domain (See Figure 19.15). 


10 10 
s 
é 6 


4 4 


-§ ° 5 =10 =~3 $ 10 


Figure 19.15: f(x) has bounded domain, but its Fourier expansion is periodic. 


The way to think about and deal with this is to simply ignore the infinite number of 
periods magnanimously afforded by the Fourier series expression (as a function of x) 
and only take the one period that lies above f’s original, bounded domain. 


Compact Support, Alone, is Not Enough 


In contrast, if we defined a function which is defined over all R but had compact 
support, [—7, 7], it would not have a Fourier series; no single weighted sum of sinusoids 
could build this function, because we can’t reconstruct the flat f(#) = 0 regions on 
the left and right with a single (even infinite) sum. We can break it up into three 
regions, and deal with each separately, but that’s a different story. 


593 


19.4 The Complex Fourier Series of a Periodic Func- 
tion 


19.4.1 Definitions 


We continue to study real functions of a real variable, f(a), which are either periodic 
or have a bounded domain. We still want to express them as a weighted sum of special 
“nure” frequency functions. I remind you, also, that we are restricting our attention 
to functions with period (or domain length) 27, but our results will apply to functions 
having any period T if we tweak them using factors of T or 1/T in the right places. 


To convert from sines and cosines to complex numbers, one formula should come 


to mind: Euler’s formula, 
e® = cosé+isiné. 


Solving this for cosine and sine, we find: 


e9 as e8 
cos@ = 
2 
e9 = e 
sind = - 
21 


While I won’t show the four or five steps, explicitly, we can apply these equivalences 
to the real Fourier expression for f, 


1 [oe) [o-e) 
oo ao 5 + Deen + S° bn sinner, 


n=1 


to get an equivalent expression, 


| . l ) ing | = 1 ees ing 
fa) = a0 5 ae Da z(t — ada) | de 5 (an + ibn je”, 


n=1 
Now, let n runneth negative to form our Complex Fourier Series of the (same) func- 
tion f. 


The Complex Fourier Series. The Complex Fourier series of a 
well-behaved periodic function with period 27 is the sum 


f(x) ag S- ope 
where 
5 (dn — ibn) , n>0 
Ek Seat SO 
59 , n=0 


504 


The complex Fourier series is a cleaner sum than the real Fourier expansion which 
uses sinusoids, and it allows us to deal with all the coefficients at once when doing 
computations. The price we pay is that the coefficients, c,, are generally complex 
(not to mention the exponentials, themselves). However, even when they are all 
complex, the sum is still real. We have been — and continue to be — interested in 
real-valued functions of a real variable. The fact that we are using complex functions 
and coefficients to construct a real-valued function does not change our focus. 


19.4.2 Computing The Complex Fourier Coefficients 


I expressed the c,, of the complex Fourier series in terms of the a, and b, of the real 
Fourier series. That was to demonstrate that this new form existed, not to encourage 
you to first compute the real Fourier series and, from that, compute the c, of the 
complex form. The formula for computing the complex spectrum, {c,,} is 


GS = | eo falda. 
DAP J se 


We can learn a lot by placing the complex Fourier expansion of our periodic f on the 
same line as the (new, explicit) expression for the cp. 


TT 


fey. = Ss" Ge, where c,h = e Taide 


on 
n= —0o NT Jan 


Make a mental note of the following observations by confirming, visually, that they’re 
true. 


e We are expressing f as a weighted-sum (weights c,,) of complex basis functions 
e’”, But the integral is also a kind of sum, so we are simultaneously expressing 
the c, as a “weighted-sum” (weights f(a)) of complex basis functions e~’”” 


e Under the first sum, the x in the nth “basis function,” e’””, is fixed (that’s the 
number at which we are evaluating f), but n is a summation variable; under 
the second integration, it is the n of e~*"” which is fixed (that’s the index at 
which we are evaluting the coefficient, c,), and x is the itegration variable. So 
the roles of x and n are swapped. 


e The sequence of complex weights, {c,}, is nothing more than a function c(n) 
on the set of all integers, Z. This emphasizes that not only is f() a function of 
x, but c() is a function of n. This way of thinking makes the above expression 
look even more symmetric 


while, 
cin 5 ie vje dx . 


599 


e If we know a function, f(x), we compute a specific spectrum value, c,, by 


multiplying each f(x) by the complex basis function e 


ing 


and integrating x 


from —7 to m. If we know a spectrum, {c,}, we compute a specific functional 
value, f(x), by multiplying each c, by the complex basis functions e’’* and 


summing n from —oo to oo. 


The correspondence between functions over 


f(@) <— 


R and Z, 
c(n) 


is, conceptually, its own inverse. You do (very roughly) the same thing to get the 
spectrum from the function as you do to build the function from its spectrum. 


Example 


Let’s expandf (a) = x along the complex Fourier basis. 


Our goal is to find the complex coefficients, c,, that make the following true: 


x 


Always begin with the easy one: 


TT 


Co ao 
(cae eee 


n=—CoO 


xe? dx 


oo 
S eee : 


TT 


x dx 0. 


—T 


The others, both positive and negative, are done in a single breath, 


1 TT 


Gi = Se ze dx 
a TT 
1 eine i‘ 
2 oe (ine + 1) 
db eal int int inn int 
= an ma lint (e +e") + ( e"") | 
1 
= ape im™ (2cosnm) — (2isinnz)| 
Tn 
1 
= mm nee! [na (cosnm) — OI 
™n 
= ~cosnm 
n 
= (-1"<. 
n 


Putting it all together, we get 


19.5 Periods and Frequencies 


19.5.1 The Frequency of any Periodic Function 


This short section will be very useful in motivating the approach to Shor’s period- 
finding algorithm. In order to use it, [ll temporarily need the letter f to mean 
frequency (in keeping with the classical scientific literature), so we’re going to call our 
periodic function under study g(z). 


We’ve been studying periodic functions, g(x) of real x which have periods T = 27, 
and we have shown how to express them as a sum of either real sines and cosines, 


1 co [o-e) 
g(a) = ao 5 + ae + Done, 


or complex exponentials, 


Os) = ye Ge. 


n= —-cCO 


Each term in these sums has a certain frequency: the n. You may have gotten the 
impression that the term “frequency” only applies to functions of the form sin (nz), 
cos (nx) or e'"”). If so, I’d like to disabuse you of that notion (for which my presen- 
tation was partly responsible). In fact any periodic function has a frequency, even 
those which are somewhat arbitrary looking. 


We'll relax the requirement that our periodic functions have period T’ = 27. That 
was merely a convenience to make its Fourier sum take on a standard form. For the 
moment, we don’t care about Fourier series. 


We will define frequency twice, first in the usual way and then using a common 
alternative. 


19.5.2 Ordinary Frequency 
We know what the period, T,, of a periodic g(x) means. The frequency, f, of a periodic 
g(x) is just the reciprocal of the period, 
1 
a 


The interpretation is what’s important. 


e The frequency f tells you how many periods of g(x) fit into any unit interval 
(like [0, 1) or [—1/2, 1/2)). 


e When the period is on the small side, like 1/10, the frequency will be large (10). 
In this case we’d see 10 repeated patterns if we graphed g(x) between —1/2 and 
1/2. (See Figure 19.16) 


507 


Figure 19.16: f = 10 produces ten copies of the period in [—.5, .5) 


e When the period is larger, like 10, the frequency will be small (1/10). In this 
case we’d see only a small portion of the graph of g(x) between —1/2 and 1/2, 
missing the full view of its curves and gyrations, only apparent if we were to 
graph it over much larger interval containing at least one full period, say -5 to 
+5 or 0 to 10.(See Figure 19.17) 


S -0.4 -0.2 0.0 0.2 0.4 


Figure 19.17: f = .1 only reveals one tenth of period in [—.5, .5) 


The take-away here is that 
jor = J 


and if you know g’s period, you know its frequency (and vice versa). 


19.5.3 Angular Frequency 


When we ask the question “How many periods fit an interval of length /?” there’s 
nothing forcing us to choose | = 1. That’s a common choice in physics only because 
it produces answers per second, per unit or per radian of some cyclic phenomenon. 
If, instead, we wanted to express things per cycle or per revolution we would choose 
| = 27. It’s a slightly larger interval, so the same function would squeeze 6+ times as 
many periods into it; if you were to repeat something 10 times in the space (or time) 
of one unit or radian, you would get 62.8 repetitions in the span of a full revolution 
of 27 units or radians. 


598 


Angular frequency, usually denoted by the letter w, is therefore defined to be 


20 
wo = —., 
T 
The relationship between angular frequency and ordinary frequency is 
= Qary 


The relationship between period and angular frequency has the same interpretation 
and form as that for ordinary frequency with the qualitatively unimportant change 
that the number 1 now becomes 27. In particular, if you know the function’s angular 
frequency, you know its period, courtesy of 


wT = 2. 


599 


Chapter 20 


The Continuous Fourier Transform 


20.1 From Series Transform 


This is the second of three classical chapters in Fourier theory meant to prepare you 
for the quantum Fourier transform or OFT. It fits into the full path according to: 


Chapter 19 [Real Fourier Series - Complex Fourier Series] 
This Chapter coatinioss Fourier Transform] 
Chapter 21 [Discrete Fourier ie — Fast Fourier Transform] 
Chapter 22 ‘Gaetan Fourier Transform] 


In this lesson we learn how any reasonable function (not just periodic ones covered 
in the previous chapter) can be built using exponential functions, each having a 
specific frequency. The sums used in Fourier series turn into integrals this time 
around. 

Ironically, we abandon the integrals after today and go back to sums since the 


OFT is a “finite” entity. However, the continuous Fourier transform of this lesson 
is correct foundation for the OFT, so if you can devote the time, it is good reading. 


20.2 Motivation and Definitions 


20.2.1 Non-Periodic Functions 


Fourier series assumed (and required) that a function, f, was either periodic or 
restricted to a bounded domain before we could claim it was expressible as a weighted- 
sum of frequencies. A natural question to ask is whether we can find such weighted- 
sum expansions for non-periodic functions defined over all R, with or without compact 


560 


support. Can any (well-enough-behaved-but-non-periodic) function be expressed as 
a sum of basis functions like e’"*? There are two reasons to be hopeful. 


1. Even a non-periodic function over R can be considered to have bounded domain 
(and therefore be periodic) if we throw away the very distant “wings” (say 
x < —1010° and x > 1010. Therefore, restricted to this extremely large 
interval, at least, it is made up of a weighted-sum of basis functions, either 
sinusoids or e’””. All we need to do is fix up those pesky two wings that are so 
far away. 


2. The wild, fast squiggliness or the slow, smooth curvature of a graph are not 
exclusive to periodic functions; they are part of any function. We should be 
able to build them out of sines and cosines (or exponentials) whether they 
appear in functions of compact support (even if unbounded domain) or general 
functions over the real line. 


The answer to this question is the Fourier Transform (FT ). 


20.2.2 Main Result of Complex Fourier Series 


We begin with a periodic f and restate the duality between it and its (complex) 
Fourier series, 


cin) = a fine dz. 


20.2.3. Tweaking the Complex Fourier Series 


The price we'll have to pay in order to express non-periodic functions as a weighted 
sum of frequencies is that the Fourier basis will no longer be a discrete set of func- 
tions, {e’""},,¢z, indexed by an integer, n € Z, and neither will their corresponding 
weights, {Cr }nez be discretely indexable. Instead, n will have to be replaced by a real 
number, s. This means c(n) is going to be a full-fledged function of the real numbers, 
c(s), 00 < 8 < OO. 


The above formulas have to be modified in the following ways: 


e The integer n must become a real number s. 
e The sum will have to turn into an integral. 


e The limits of integration have to be changed from +7 to too. 


561 


e While not required, we make the formulas symmetric by replacing the normal- 
ization constant, 1/(27) by \/1/(27), and thus spreading that constant over 
both expressions. 


These considerations suggest that if we want to express a non-periodic but well- 
behaved function f as a “sum” of frequencies, we would do so using 


l > isx 
f(z) = <= | we ds . 


Since the sum is really an integral, the weights become a function over R, c(s), where 
each real s represents a frequency, and c(s) is “how much” of that frequency the 
function f requires. 


20.2.4 Definition of the Fourier Transform 
This weighting function, c(s), is computable from f(a) using the companion formula, 


1 - —1sx 
ae a se | fee dx . 


It is this last function of s that we call the Fourier Transform, or FT, of the function 
f(x), and it is usually denoted by the capital letter of the function we are transforming, 
in this case F'(s), 


F(s) = sf f(c)e** de. 


Notation 


We denote the Fourier transform operator using “F7,” as in 


Fo = FT(f), 

f 25 9 
or, using the script notation F, 

EE es AS 

fs ee, 


20.2.5 The Inverse Fourier Transform 


We also consider the inverse of the Fourier transform, which allows us to recover f 
from F’ 


f= FF UE), or 
f= #7) 


562 


If we look at our original motivation for the FT, we see that FT~'(F) is actually 
built-into that formula. Substitute F'(s) in for what we originally called it, c(s), and 
we have the explicit form of the inverse Fourier transform, 


1 = sx 
ioe se | Fee ds . 


The noteworthy items are: 


e The #7 '(F) and FT differ only by a sign in the exponent. 
e Alternate Definitions 


— We have defined the F7 to provide maximum symmetry. Many authors 
omit the factor of \/1/(27) in the FT at the cost of having the full factor 
1/(27) in the inverse. 


— There is a second common variant that puts a 27 into the exponent to 
make the result of certain computations look cleaner. 

— In a third variation, the exponent in the “forward” FT is positive and the 
exponent in the inverse is negative. 


— Be prepared to see the alternate definitions of F7 and the correspondingly 
modified results they will yield, usually in the form of a constant factor 
change in our results. 


e Sometimes, instead of using a capital letter F' to represent ¥(f), authors will 
use the caret or tilde for that purpose, 


f= 


_— 


20.2.6 Real vs. Complex, Even vs. Odd 
Although we start with a real-valued function, f, there is no guarantee that F’ will 
be real-valued. After all, we are multiplying f by a complex valued function e~** 


prior to integration. However, there are situations in which F' will be real, i.e., all the 
imaginary parts will cancel out during the integration. We'll need a couple definitions. 


Even Function. A function, f, is said to be even if it has the property 
f(-2) = f(z). 
A famously even function is cos x. 


Odd Function. A function, f, is said to be odd if it has the property 


f(-") = —f(). 


Figure 20.2: sinx is odd 


A famously odd function is sin x. 


Theorem. /f f is even, F' will be real-valued. If f is odd, F will be 
purely imaginary-valued. In all other cases, F' will take on values that 
have both real and imaginary parts. 


The domains of both f and F are, in our context, always assumed to be real (x 
and s are both real). It is only the values of the functions f and F that we were 
concerned about. 


Finally, you might notice that there was nothing in any of our definitions that 
required the original f be real-valued. In fact, as long as the domain is R, the range 
can be C C. We'll be using this fact implicitly as we consider the symmetric aspects 
of the spatial (or time) “domain” (i.e., where f lives) and the frequency “domain” 
(where F' lives). In other words, we can start out with a complex-valued f and end up 
with another complex-valued F’, or perhaps even a real-valued F’. All combinations 
are welcome. 


20.2.7 Conditions for a function to Possess a Fourier Trans- 
form 


Unlike Fourier series, where practically any reasonable periodic function possessed a 
Fourier series, the same cannot be said of Fourier transforms. Since our function is 


564 


now “free-range” over all of R, like a hyper-chicken in a billion acre ranch, it might 
go anywhere. We have to be careful that the Fourier integral converges. One oft cited 
sufficient condition is that f(x) be absolutely integrable, i.e., 


/ [F(a) | tie = 8. 
As you can see, simple functions. like f(x) = x or f(x) = 2? — x3 +1 don’t pass 
the absolute-integrability test. We need functions that tend to zero strongly at both 
too, like a pulse or wavelet that peters-out at both sides. Some pictures to help you 
visualize the graphs of square integrable functions are seen in figure 20.3. The main 
characteristic is that they peter-out towards too. 


(The functions that I am claiming possess a Fourier transform are the f;,(2) = 
w?(zx), since these have already been squared, and are ready to be declared absolutely- 
integrable, a stronger requirement than square-integrable.) 


Figure 20.3: Square-integrable wavefunctions from Wikipedia StationaryStatesAni- 
mation.gif, leading to the absolutely integrable w2(2) 


20.3 Learning to Compute 


Let’s do a couple even functions, since they give real-valued Fourier transforms which 
are easy to graph. 


20.3.1 Example 1: Rectangular Pulse 


Compute the Fourier transform of 


wo={) jz] <5 


0, everywhere else 


565 


We plug it directly into the definition: 


F(s) = ssf f(a)e*™ da 
1 — —isx 
“ala © 


1 7" 1 


= 2 _5 18/27 


7 /2 1 ei(-5s) _ e—i(-5s) _ 2 sin .5s 
ee Di — Nak se JO 


That wasn’t so bad. You can see both functions’ graphs in Figure 20.4. They demon- 
strate that while f is restricted to a compact support, F’ requires the entire real line for 
its full definition. We’ll see that this is no accident, and has profound consequences. 


—isx 


(ate) _ ei 59) 


Figure 20.4: A simple function and its Fourier transform 


20.3.2 Example 2: Gaussian 
A Gaussian function, centered at the origin has the form 
f(z) = NeW®/@’), 


N is the height of its peak at the origin, and a is called the standard deviation which 
conveys what percentage of the total area under f falls between +ko, for k = 1, 2,3, 
or any multiple we like. When k = 3, 99.7% of the area is covered. (Figure 20.5 
demonstrates this.) 


-30 -20 -lo 0 lo 20 30 


Figure 20.5: Interpretation of o from Wikipedia Standard_deviation_diagram) 


566 


Its Fourier transform is 


N oe 2 2 ; 
F(s) = — ee) oh dy. 
V20 [. 
This integrand has the form 


er) 
e ax 7s 


which can be computed by completing the square and using a polar coordinate trick 
(look it up — it’s fun). The result is 


F(s) = No ere), 


which, if you look carefully, is another Gaussian, but now with a different height 
and standard deviation. The standard deviation is now 1/a, rather than 0. Loosely 
speaking, if the “spread” of f is wide, the “spread” of F' is narrow, and vice versa. 


20.4 Interlude: The Delta Function 


20.4.1 Characterization of the Delta Function 


An invaluable computational tool in the theory of Fourier transforms is the “impulse”, 
6(a), a construct that (check your scrutiny at the door, please) behaves like a function 
of the real numbers that is zero everywhere except at the origin, where its value is 
oo. It also has the property that its integral over —e to +e (for any positive ¢) is 1. 
In symbols, 


oO, ie] 


0, otherwise 


(Since 6 is 0 away from the origin, the limits of integration can be chosen to be any 
interval that contains it.) 


The delta function is also known as the “Dirac delta function,” after the physicist, 
Paul Dirac, who introduced it into the literature. It can’t be graphed, exactly, since 
it requires information not visible on a page, but it has a graphic representation as 
shown in figure 20.6. 


567 


-2 -1 1 2 


Figure 20.6: Graph of the Dirac delta function 


20.4.2 The Delta Function as a Limit of Rectangles 


There are many ways to make this notation rigorous, the simplest of which is to 
visualize 6(x) as the limit of a sequence of functions, {6,(x)}, the nth “box” function 
defined by 


n, ifx € [ 1 | 


On? 2n 


0, otherwise 


Each 6,(x) satisfies the integration requirement, and as n — ov, 6,(x) becomes 
arbitrarily narrow and tall, maintaining its unit area all the while. (See Figure 20.7) 


-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 


Figure 20.7: Sequence of box functions that approximate the delta function with 
increasing accuracy 


And if we’re not too bothered by the imprecision of informality, we accept the defi- 
nition 


(2). = limi 6,(%). 


n> Co 


In fact, the converging family of functions {6,(2)} serves a dual purpose. In a com- 
puter we would select an N large enough to provide some desired level of accuracy and 
use dy(x) as an approximation for 6(z), thus creating a true function (no infinities 
involved) which can be used for computations, yet still has the properties we require. 


568 


20.4.3. The Delta Function as a Limit of Exponentials 


While simple, the previous definition lacks “smoothness.” Any function that has unit 
area and is essentially zero away from the origin will work. A smoother family of 
Gaussian functions, indexed by a real parameter a, 


b(t) = Joe, 
7 
does the job nicely: 
a) = lim Oa @) 


(See Figure 20.8) 


-0.6 -0.4 —-0.2 0.0 0.2 0.4 0.6 


Figure 20.8: Sequence of smooth functions that approximate the delta function with 
increasing accuracy 


Using the integration tricks mentioned earlier, you can confirm that this has the 
integration properties needed. 


20.4.4 Sifting Property of the Delta Function 


Among the many properties of the delta function is its ability to pick out (sift) an 
individual f (xo) of any function f at the domain point 29 (no matter how mis-behaved 


f is), 


fey = / * {Vi G=—20 de, 


or (equivalently), 


I| 
oo; 
& 
+ 
fay 
pean 
= 
— 
8 
| 
8 
cae 
Q 
8 


You can prove this by doing the integration on the approximating sequence {6,,(a)} 
and take the limit. 


This sifting property is useful in its own right, but it also gives another way to 


express the delta function, 


Ee cae 
Ce. = Se e ds . 


It looks weird, I know, but you have all the tools to prove it. Here are the steps: 


1. Compute # (d(x — %p)) with the help of the sifting property. 
2. Take Y~' of both sides. 


3. Set 2p = 0. 


[Exercise. Show these steps explicitly. 


20.5 Fourier Transforms Involving 6(z) 


It turns out to be convenient to take Fourier transforms of functions that are not 
absolutely integrable. However, not only are such functions not guaranteed to have 
converging F7 integrals, the ones we want to transform in fact do not have converging 
FT integrals. That’s not going to stop us, though, because we just introduced a non- 
function function, 6(2), which will be at the “receiving end” when we start with a 
not-so-well-behaved f. 


20.5.1 Example 3: A Constant 


Consider the function f(x) = 1. 


[FF = ]() = ae ft eae 2 


The integral looks very much like an expression of the delta function. (Compare it 
to the last of our many definitions of 6(x), about half-a-page up). If the exponent 
did not have that minus sign, the integral would be exactly 27 6(s), making ? = 
/2n 6(s). That suggests that we use integration by substitution, setting 2’ = —x and 
wind up with an integral that does match this last expression of the delta function. 
That would work: 


[Exercise. Try it.] 


For an interesting alternative, let’s simply guess that the minus sign in the ex- 
ponent doesn’t matter, implying the answer would still be V276(s). We then test 


570 


our hypothesis by taking ¥~! [Vv Qqr 6(s)| and confirm that it gives us back f(a) = 1. 
Watch. 


ganas) \(s) = if vand(s) eds 
[v2 (9) Pals 


/ 6(s) e'** ds 


/ jeje ds = gee = 1. x 


(oe) 


The last line comes about by applying the sifting property of 6(s) to the function 
f(s) = e** at the point z9 = 0. 


We have a Fourier transform pair 
> Wind (Si. 


and we could have started with a 6(z) in the spatial domain which would have given a 


constant in the frequency domain. (The delta function apparently has equal amounts 
of all frequencies.) 


Re[F(s)] 


Figure 20.9: F[1] is a 0-centered delta function 


Also, our intuition turned out to be correct: ignoring the minus sign in the ex- 
ponent of e did not change the resulting integral, even when we left the integration 


boundaries alone; the delta function can be expressed with a negative sign in the 
exponent, 


(a ice 
(2) = — e "ds. 


—co 


571 


20.5.2 Example 4: A Cosine 


Next we try f(x) = cosz. This is done by first solving Fuler’s formula for cosine, 
then using the definition of 6(x), 


Ea [cos x] | (s) cos(x) e'** dr 


1 oo 
V 20 [. 
1 eo (= me —) ae q 
= ———— o—<—$—$ e 4 0 
V2T Jeo 2 


1 ca a . 
_ ixz(1-s) —ix(1+s) 
— € dx + / € te 
2V 27 (f- —oo 


_ sam (27 ll 8) +27 6(1-+5)) 


a 5 (50-5) + 5(1+s)). 


i[Re[F(s) ] 


Figure 20.10: [cos a] is a pair of real-valued delta functions 


20.5.3 Example 5: A Sine 


f(x) = sinx can be derived exactly the same way that we did the cosine, only here 
we solve Euler’s formula for sine, giving 


Figure 20.11: [sin z] is a pair of imaginary delta functions 


Notice something interesting: this is the first time we see a Fourier transform that 
is not real (never mind that 6(s) isn’t even a function ... we’re inured to that by 


572 


now). What do you remember from the last few sections that would have predicted 
this result? 


Does the FT of sina (or cosxz) make sense? It should. The spectrum of sin x 
needs only one frequency to represent it: |s| = 1. (We get two, of course, because 
there’s an impulse at s = — +1, but there’s only one magnitude.) If we had done 
sin ax instead, we would have seen the impulses appear at s = +a, instead of +1, 
which agrees with our intuition that sinax requires only one frequency to represent 
it. The Fourier coefficients (weights) are zero everywhere except for s = +a. 


(Use this reasoning to explain why a constant function has an FT = one impulse 
at 0.) 


As an exercise, you can throw constants into any of the functions whose F7s we 
computed, above. For example, try doing Asin (27nz). 


20.6 Properties of the Fourier Transform 


There are some oft cited facts about the Fourier Transform that we present in this 
section. The first is one we’ll need in this course, and the others are results that you'll 
probably use if you take courses in physics or engineering. 


20.6.1 ‘Translation Invariance 


One aspect of the FT we will find useful is its shift-property. For real a (the only 
kind of number that makes sense when f happens to be a function of R), 
f(e-—a) = eR (se): 
This can be stated in various equivalent forms, two of which are 
f(ct+a) aan eSF(s), and 
Fe=4) => -e"@), 


where in the last version we do need to state that 7 is real, since F’ generally is a 
function of C. 


If you “translate” (move five feet or delay by two seconds) the function in the 
spatial or time domain, it causes a benign phase-shift (by a) in the frequency domain. 
Seen in reverse, if you translate all the frequencies by a constant, this only multiplies 
the spatial or time signal by a unit vector in C, again, something that can usually 
be ignored (although it may not make sense if you are only considering real f). This 
means that the translation in one domain has no measurable effect on the magnitude 
of the signal in the other. 


This is a kind of invariance, because we don’t care as much about e’“*F'(s) as we 
do its absolute-value-squared, and in that case 


Je F(s)|? = Je |" |F(s)? = |F(s)/’. 


573 


Since we’ll be calculating probabilities of quantum states and probabilities are the 
amplitudes’ absolute-values-squared, this says that both f(x) and f(x + a) have 
Fourier transforms which possess the same absolute-values and therefore the same 
probabilities. We’ll use this in Shor’s quantum period-finding algorithm. 


20.6.2. Plancherel’s Theorem 


There’s an interesting and crucial fact about f and F’: the area between their graphs 
and their domain axes are equal. This turns out to be true of all functions and their 
Fourier transforms and the result is called Plancherel’s Theorem. 


Plancherel’s Theorem. For any Fourier transform pair, F and f, we 


have 
[ vera = f \Pe)Pas. 


oe) (oe) 


A picture demonstrating Plancherel’s Theorem appears in Figure 20.12 


Figure 20.12: Area under |f|? and |F|? are equal (Plancherel’s Theorem) 


The reason this theorem is so important in quantum mechanics is that our func- 
tions are amplitudes of quantum states, and their squared absolute values are the 
probability densities of those states. Since we want any state to sum (integrate) to 1 
over all space (“the particle must be somewhere” ), we need to know that the Fourier 
transform does not disturb that property. Plancherel’s theorem assures us of this 
fact. 


20.6.3 Convolution 


One last property that is heavily used in engineering, signal processing, math and 
physics is the convolution theorem. 


A convolution is a binary operator on two functions that produces a third function. 
Say we have an input signal (maybe an image), f, and a filter function g that we 
want to apply to f. Think of g as anything you want to “do” to the signal. Do you 
want to reduce the salt-and-pepper noise in the image? There’s a g for that. Do you 
want to make the image high contrast? There’s a g for that. How about looking at 


574 


only vertical edges (which a robot would care about when slewing its arms). There’s 
another g for that. The filter g is applied to the signal f to get the output signal 
which we denote f * g and call the convolution of f and g: The convolution of f 
and g, written f * g, is the function defined by 


[fegl(x) = i faite 


The simplest filter to imagine is one that smooths out the rough edges (“noise”). This 
is done by replacing f(x) with a function h(x) which, at each x, is the average over 
some interval containing x, say +2 from x. With this idea, h(10) would be 


n(10) = K / t(é) dé, 


while h(10.1) would be 


12.1 
(LOL) 5 de a te) de. 
(Here K is some normalizing constant like 1/(sample interval).) If f had lots of change 
from any «x to its close neighbors (noise), |f(10) — f(10.1)| could be quite large. But 
|h(10) — h(10.1)]) will be small since the two numbers are integrals over almost the 
same interval around « = 10. This is sometimes called a running average and is 
used to track financial markets by filtering out the moment-to-moment or day-to-day 
noise. Well this is nothing more than a convolution of f and g, where g(x) = K for 
|x| < #daystoavg., and 0, everywhere else (AK often chosen to be 1/(# days to avg.). 


20.6.4 The Convolution Theorem 


The convolution theorem tells us that rather than apply a convolution to two functions 
directly, we can get it by taking the ordinary point-by-point multiplication of their 
Fourier transforms, something that is actually easier and faster due to fast algorithms 
to compute transforms. 


The Convolution Theorem. 
fxg = V2r F"| F(f)- F(g)] 


The constant 22 would be replaced by a different constant if we were using one of 
the alternate definitions of the Fourier transform. 


20.7 Period and Frequency in Fourier Transforms 


Let’s take a moment to review the relationship between the period and frequency of 
the particularly pure periodic function sinnx for some integer n. For 


sin nx 


575 


the period is T = 27/n making the angular frequency 


2 
Wo = — = Nn. 


T 


Well this is how we first introduced the idea of frequency informally at the start of 
this lecture: I just declared the n in sinnx to be a frequency and showed pictures 
suggesting that sines with large n squiggled more than those with small n. 

The period-frequency relationship is demonstrated when we look at the Fourier 
transform of two sinusoids, one with period 27 and another with period 27/3. First, 
sin x and its spectrum are shown in figure 20.13 where it is clear that the frequency 
domain has only two non-zero frequencies at +1. 


f (x) 1 Im[F(s)] 


Figure 20.13: sin(x) and its spectrum 


Next, sin 3x and its spectrum appear in figure 20.14 where the spectrum still has only 
two non-zero frequencies, but this time they appear at +3. 


f(x) Im[F(s)] 


Figure 20.14: sin(3x) and its spectrum 


Notice that both cases, when we consider w to be the absolute value of the delta 
spike’s position, we confirm the formula, 


wl = 2, 


reminding us that if we know the period we know the (angular) frequency, and vice 
versa. 


20.8 Applications 


20.8.1 What’s the 77 Used For? 


The list of applications of the F7 is impressive. It ranges from cleaning up noisy audio 
signals to applying special filters in digital photography. It’s used in communications, 


576 


circuit-building and numerical approximation. In wave and quantum physics, the 
Fourier transform is an alternate way to view a quantum state of a system. 


In one example scenario, we seek to design an algorithm that will target certain 
frequencies of a picture, audio or signal, f. Rather than work directly on f, where it 
is unclear where the frequencies actually are, we take F = ¥(f). Now, we can isolate, 
remove, enhance the exact frequencies of F'(s) that interest us. After modifying F’ to 
our satisfaction, we recover f = F~'(f). 


A second application, dearer to quantum physicists, is the representation of states 
of a quantum system. We might model a system such as an electron zipping through a 
known magnetic field (but it could be a laser scattering off a crystal, the spin states of 
two entangled particles, or some other physical or theoretical apparatus). We prepare 
the system in a specific state — the electron is given a particular energy as it enters 
the field at time t = 0. This state is expressed by a wavefunction WV. If we did a 
good job modeling the system, WV tells us everything we could possibly know about 
the system at a specific time. 


Now W is essentially a vector in a Hilbert space, and like all vectors, it can be 
expressed in different bases depending on our interest. 


e Position Basis - ~(x) 


When expressed in the position basis, it has one form, w(x). This form reveals 
the amplitudes for the position of the electron at some moment in time; we take 
its magnitude-squared, |7)(x)|?, and integrate that over a region of x-space to 
learn the likelihood that the electron’s position, if measured, would be in that 
region. 


e Momentum Basis - (p) 


When expressed in the momentum basis, it has a different form, y(p). This 
form reveals the amplitudes for the momentum of the electron: we take its 
magnitude-squared, |y(p)|?, and integrate it over a region of p-space to learn 
the likelihood that the electron’s momentum, if measured, would fall in that 
region. 


Both w(x) and y(p) tell the exact same story, but from different points of view. 
Again, like any vector’s coordinates, we can transform the coordinates to a different 
basis using a simple linear transformation. It so happens that the way one transforms 
the position representation to the momentum representation in quantum mechanics 
is by using the Fourier transform 


y(p) = F(y(x)). 


In other words, moving between position space and momentum space is accomplished 
by applying the Fourier transform or its inverse. 


577 


20.8.2 The Uncertainty Principle 


In quantum mechanics, every student studies a Gaussian wave-packet, which has the 
same form as that of our example, but with an interpretation: f = wW is a parti- 
cle’s position-state, with w(x) being an amplitude. We have seen that the magnitude 
squared of the amplitudes tell us the probability that the particle has a certain posi- 
tion. So, if w represents a wave-packet in position space, then |7)(x)|? reveals relative 
likelihoods that the particle is at position x. Meanwhile y = ¥(w), as we learned a 
moment ago, is the same state in terms of momentum. If we want the probabilities 
for momentum, we would graph y(s). 


(Figures 20.15 and 20.16 show two different wave-packet Gaussians after taking 
their absolute value-squared and likewise for their Fourier transforms. 


4 * . 
-4 -2 2 4 


|f(x)|?_ (normalized) |F(s)|?_ (normalized) 


Figure 20.15: A normalized Gaussian with o? = 3 and its Fourier transform 


4 


» 


|F(s)|?_ (normalized) 


-4 


-4 -2 2 4 


\f(x)|*?_ (normalized) 


Figure 20.16: A more localized Gaussian with o? = 1/7 and its Fourier transform 


The second pair represents an initial narrower spread of its position state compared 
to the first. The uncertainty of its position has become smaller. But notice what 
happened to its Fourier transform, which is the probability density for momentum. 
Its uncertainty has become larger (a wider spread). As we set up the experiment to 
pin down its position, any measurement of its momentum will be less certain. You are 
looking at the actual mathematical expression of the Heisenberg uncertainty principle 
in the specific case where the two observables are position and momentum. 


20.9 Summary 


This is a mere fraction of the important techniques and properties of the Fourier 
transform, and I’d like to dig deeper, but we’re on a mission. If you’re interested, I 
recommend researching the sampling theorem and Nyquist frequency. 


578 


And with that, we wrap up our overview of the classical Fourier series and Fourier 
transform. Our next step in the ladder to OFT is the discrete Fourier transform. 


579 


Chapter 21 


The Discrete and Fast Fourier 
Transforms 


21.1 From Continuous to Discrete 


This is the last of three classical chapters in Fourier theory meant to prepare you for 
the quantum Fourier transform. It fits into the full path according to: 


Chapter 19 [Real Fourier Series + Complex Fourier Series] 
Chapter 20 Oui Fourier Transform] 
This Chapter [Discrete Fourier ieaneeen — Fast Fourier Transform] 
Chapter 22 icuien Fourier Transform] 


In this this lesson we turn the integrals of the last chapter back into nice finite 
sums. This is allowed because the functions we really care about will be defined on a 
finite set of integers, a property shared by the important functions that are the target 
of Shor’s quantum period-finding and factoring algorithms. 


After defining the discrete Fourier transform, or DFT, we'll study a computa- 
tional improvement called the fast Fourier transform, or FFT. Even though the 
DFT is what we really need in order to understand the use of the QFT in our 
algorithms, the development of the recursion relations in the FFT will be needed in 
the next chapter when we design the quantum circuitry that implements the OFT. 


580 


21.2 Motivation and Definitions 


21.2.1 Functions Mapping Zy — C 


Sometimes Continuous Domains Don’t Work 


Instead of functions, f : R —> C, we turn to functions which are only defined on the 
finite set of integers Zy = {0,1,2,..., N—1}. This is motivated by two independent 
needs: 


1. Computers, having a finite capacity, can’t store data for continuous functions 
like 0.52? — 1, cos Nx or Ae?™’"*, Instead, such functions must be sampled at 
discrete points, producing arrays of numbers that only approximate the original 
function. We are obliged to process these arrays, not the original functions. 


Figure 21.1: Continuous functions on a continuous domain 


Figure 21.2: Sampling a continuous function at finitely many points 


2. Most of the data we analyze don’t have a function or formula behind them. 
They originate as discrete arrays of numbers, making a function of a continuous 
variable inapplicable. 


Functions as Vectors 


Although defined only on a finite set of integers, such function may still take real or 
complex values, 


R or 
(@ P| 


fs ty | 


581 


and we can convey all the information about a particular function using an array or 
vector, 


f (0) Co 

f() C1 

f<o oo ; 
TAN 1) CN-1 


In other words, the function, f can be viewed as a vector in C% (or possibly RY). 
You will see me switch freely between functional notation, f(k), and vector notation, 
fr, or cy. The vectors can be 2, 3, or N-dimensional. They may model a 2-D security 
cam image, a 3-D printer job or an N-dimensional quantum system. 


Applicability of Complex Vectors 


Since N-dimensional quantum systems are the stuff of this course, we’ll need the 
vectors to be complex: Quantum mechanics requires complex scalars in order to ac- 
curately model physical systems and create relative phase differences of superposition 
states. Although our initial vector coordinates might be real (in fact, they may come 
from the tiny set {0,1}), we’ll still want to think of them as living in C. Indeed, our 
operators and gates will convert such coordinates into complex numbers. 


21.2.2 Defining the DFT 


The definitions and results of the classical FT carry over nicely to the DFT. 


Primitive Nth Roots are Central 
We start by reviving our notation for the complex primitive Nth root of unity, 
e2ti/N | 


WN 


(Isay “the,” primitive Nth root because in this course I only consider this one number 
to hold that title. It removes ambiguity and simplifies the discussion.) When clear 
from the context, we'll suppress the subscript N and simply use w, 


Ww => Wy. 


From the primitive Nth root, we generate all N of the roots (including 1, itself): 


2 3 N-1 
{1, w, w, Wy rt, Ww } 
or 
2ni/N 4ni/N 6ni/N N—-1)2ni/N 
{1, eR, efmilN GBriIN | g(N—D2ni/NY 


These roots will be central to our definition of DFT. 


582 


2i(2/ 5) / 
aie ~ 
e oe 


ro 


tie 5) 


Figure 21.3: Primitive 5th root of 1 (the thick radius) and the four other 5th roots it 
generates 


Recap of the Continuous Fourier Transform 


Let’s look again at the way the 7 maps mostly continuous functions to other mostly 
continuous functions. The ¥7, also written .F, was defined as a map between func- 
tions, 


F = Z(f) 
f= F 


which produced F from f using the formula 


F(s). = el. f(x) e** da. 


It’s easy to forget what the F7 is and think of it merely as the above formula, so I’ll 
pester you by emphasizing the reason for this definition: we wanted to express f(z) 
as a weighted-“sum” of frequencies, s, the weights being F’(s), 


1 - - 
Ta): = se | Pleas. 
Adapting the Definition to the DFT 


To define the DFT, we start with the F7 and make the following adjustments. 


583 


e The integrals will become sums from 0 to N — 1. Rather than evaluating f at 
real continuous x, we will be taking the N discrete values of the vector (f;,) and 
producing a complex spectrum vector (F;) that also has N components. 


e The factor of 1/./27 will be replaced by a normalizing factor 1/N. This is 
a choice, just like the specific flavor of FT we used (e.g., the factor of 1/27 
which some authors omit). In quantum computing the choice is driven by our 
need for all vectors to live on the projective sphere in Hilbert space and therefore 
be normalized. 


e We will use Nth roots of unity, w/*, in place of the general exponential, e’**. 
[You can skip the remainder of this bullet, without loss of crucial information, 
but for those wishing a more complete explanation, read on.] 


Consider e~*S* to be a function of x which treats s as a constant frequency — a 
continuous parameter that nails down the function, 

Ys (x) = e7 8% ; 
Let’s rewrite the spectrum, F’, using the symbolism of this s-parameter family, 
~;(x), in place of e~***, 


F(s) = = i Koco. 


In the discrete case, we want functions of the index k, each of whose constant 
frequency is parametrized by the discrete parameter 7. The Nth roots of unity 
provide the ideal surrogate for the continuously parametrized y,. To make the 
analogy to the F7 as true as possible, I will take the negative of all the roots’ 
exponents (which produces the same N roots of unity, but in a different order), 
and define 


The N functions {bj} replace the infinite {y,} 


continuous basis functions of x, e~ 


scr: Im other words, the 


‘st yarametrized by s, become N vectors, 


U30 w Io 
VUj1 is 
Vv; = : = se p SO, Wea Vd 
J Vik WW) jk ’ J aaa) ’ 
Uj(N-1) VagN=l) 


where k is the coordinate index and j is the parameter that labels each vector. 


584 


Official Definition of DFT 


And so we find ourselves compelled to define the discrete Fourier transform, or DFT , 
as follows. 


Definition of DF 7. An Nth order DFT is a function of an N-component 
vector f = (f,) that produces a new N-component vector F = (F;) de- 
scribed by the formula giving the output vector’s N coordinates, 


N-1 
1 : 
R= == S fo. forg = 0.4 = 1; 
where wW = wy ts the primitive Nth root of 1. 


If we need to emphasize the order of the DFT, we could use a parenthesized exponent, 
DFT). However, when the common dimension of the input and output vectors is 
understood from the context, we usually omit exponent N as well as the phrase “Nth 
order” and just refer to it as the DFT. 


Notation 
Common notational options that we’ll use for the DFT include 
Be ae SDP = DFTs 
The jth coordinate of the output can be also expressed in various equivalent ways, 
Fy = [DFT(f)l, = DFT(f)) = DFTIfl;- 
The last two lack surrounding parentheses or brackets, but the subscript 7 still applies 
to the entire output vector, F’. 
Note on Alternate Definitions 


Just as with the F7, the DFJ has many variants all essentially equivalent but 
producing slightly different constants or minus signs in the results. In one case the 
forward DFT has no factor of 1/./27, but the reverse DFT contains a full 1/(27). In 
another, the exponential is positive. To make things more confusing, a third version 
has a positive exponent of w, but w is defined to have the minus sign built-into it, so 
the overall definition is actually the same as ours. Be ready to see deviations as you 
perambulate the literature. 


585 


Inverse DFT 


Our expectation is that {Fj} so defined will provide the weighting factors needed to 
make an expansion of f as a weighted sum of the frequencies w’, 


if N-1 
= —_ Rew 


valid. If the DFT is to truly represent a discrete rendition of the continuous FT, 
not only should this equality hold, but it will provide the definition of the inverse 
transform, DFT~'. Therefore, we must verify the formula, or stated differently, 
consider it to be the definition of DFT! and prove that it is consistent with the 
DFT in the sense that DFT~'o DFT = 1. 


(The “—1” in DFT! means inverse not order, the latter always appearing in 
parentheses. Of course, “order (—1)” doesn’t make sense, anyway.) 


Proof of Consistency of the Definitions. We substitute the definition of DFT 
of a vector (f;) into the expression of f as a weighted sum of F; in the definition of 
DFT‘ to see if we recover the original vector (f,). Let’s have some fun. 


replace F;; by its definition, being careful not to re-use spent index, k 
j 
N-1 


From exercise (d) of the section Roots of Unity (at the end of our complex arithmetic 
lecture) the sum in parentheses collapses to Ndpm, so the double sum becomes 


586 


21.3. Matrix Representation of the DFT 


If we let € = w~! (and remember, we are using the shorthand w = wy = e?*"/"), the 
following matrix, W, encodes the DFT in a convenient way. Define W as 


1 1 1 1 tee 1 
1 ¢ C ce oe ct 
1 Ce C? « are Cn 
1 : ; ; ‘ . 
W = —=I]: : : ; : 
VN] Gs (21 sk. g(N-1 
1 cN-1 C2N=1) c3N-1) Sie c(N-D(N-1) 


Now, we can express DFT of the vector (f;) as 
DFT|f] = W-(fe)- 


To see this, just write it out long hand. 


1 1 1  4e 1 fo 
th. . <23G Ge (a iis fi 
1 ¢ C ae ae oe?) fo 
1 : 3 : ; s : 
VN}y G3 Gh CH. CIN-D é 
1 cN-1 c2N-1) cxn-1) Paty c(N-1)(=1) poh 
The jth component of this product is the sum 
N- , Xa 
LS pet = — Do fru 
VN k=0 VN k=0 


which is just DFT [f],;. 


21.4 Properties of DFT 


All the results for the continuous FT carry over to the DFT. (If you skipped the 
continuous FT chapter, take these as definitions without worry.) 


21.4.1 Convolution of Two Vectors 


In vector form, convolution of two discrete functions, f and g, becomes 


Lf * g)k = Vhiow 


587 


The convolution theorem for continuous FJ holds in the discrete case, where we still 
have a way to compute the (this time discrete) convolution by using DFT: 


fxg = V2 DFT~"|DFT(f)-DFT(9)| 


21.4.2 Translation Invariance (Shift Property) for Vectors 


Translation invariance a.k.a the shift property holds in the discrete case in the form 
of 


DFT = 
fri 2 wf, 
As you can see, it’s the same idea: a spatial or time translation in one domain (shifting 
the index by —l) corresponds to a phase shift (multiplication by a root of unity) in 
the other domain. 


21.4.3 Computational Complexity of DFT 


The formula 
, wa 
DF a oe 
DFT, = Ty La tue 


tells us that we have ~ N complex terms to add and multiply for each resulting 
component F; in the spectrum. Since the spectrum consists of N coordinates, that’s 
N? complex operations, each of which consists of two or three real operations, so it’s 
still O(N?) in terms of real operations. Furthermore, we often use fixed point (fixed 
precision) arithmetic in computers, making the real sums and products independent 
of the size, N, of the vector. Thus, the DFT is O(N’), period. 

You might argue that as N increases, so will the precision needed by each floating 
point multiplication or addition, which would require that we incorporate the number 
of digits of precision, m, into the growth estimate, causing it to be approximately 
O (N?m?). However, we’ll stick with O(N?) and call this the time complexity relative 
to the arithmetic operations, i.e., “above” them. This is simpler, often correct and 
will give us a fair basis for comparison with the up-coming fast Fourier transform 
and quantum Fourier transform. 


21.5 Period and Frequency in Discrete Fourier Trans- 
forms 


A DFT applies only to functions defined on the bounded integral domain of integers 
Zn = {0,1,2,..., N—1}. Within that interval of N domain numbers, if the function 
repeats itself many times — that is, if it has a period T, where T << N (T is much 


588 


less than N) — then the function would exhibit periodicity relative to the domain. 
That, in turn, implies it has an associated frequency, f. In the continuous cases, T’ 
and f are related by one of two common relationships, either 


Le = 4 
or 
Lf = Qn: 


One could actually make the constant on the RHS different from 1 or 27, although 
it’s rarely done. But in the discrete case we do use a different constant, namely, N. 
We still have the periodicity condition that 


f(kK+T) = fk), 


for all k (when both k and k+T are in the domain), but now the frequency is defined 
by 


CPSs Ne: 


If you were to compute the DFT of a somewhat random vector like 


( .1, .5, 0, .25, .15, .3, 0, «1, 


oooooo0o°o: 
SGONOFROOO 

NO 

ol 

Be 

ol 

wo 

jo} 

B 


1 
1 
1 
3 
1 
0 
oO, 

ats, 
4 
3 
1 
1 
25, 1, 
1, .25, 0, .25, .15, .0, .6, .1, 
1, 0, .5, .25, .0, .3, 0, .1, 
1, 0, 0, .25, .15, .8, 0, .1) , 


you would see no dominant frequency, which agrees with the fact that the function is 
not periodic. (See figure 21.4) 


20 40 60 80 100 120 


Figure 21.4: The DFT of a non-periodic vector 


On the other hand, a periodic vector, like 


589 


( .1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1, 
4, 0505. «254 2255 485. 0; <4 
1, 0; 0,225, 245. 180 0; <4; 
1, 0, 0, .25, .15, .8, 0, 1, 
1, 0, 0, .25, .15, .8, 0, 1, 
1, 0, 0, .25, .15, .3, 0, .1, 
1, 0, 0, .25, .15, .3, 0, .1 ) 


exhibits a very strongly selective DFT, as figure 21.5 shows. 


0 20 40 60 80 100 20 


Figure 21.5: The spectrum of a periodic vector 


A very pure periodic vector akin to the pure “basis functions” of the continuous cases, 
like sinna, cosna and e’’* would be one like 


(0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0, 0, 
0, 0, 0, .25, 0, 0, 0,0), 


and this one has a DFT in which all the non-zero frequencies in the spectrum have 
the same amplitudes, as seen in figure 21.6. 


For all these examples I used N = 128, and for the two periodic cases, the period 
was T' = 8 (look at the vectors), which would make the frequency f = N/8 = 128/8 = 
16. You can see that all of the non-zero amplitudes in the spectrum (i.e., the DFT) 
are multiples of 16: 16, 32, 48, etc. (There is a “phantom spike” at 128, but that’s 


590 


i] 20 40 60 80 100 120 


Figure 21.6: The spectrum of a very pure periodic vector 


beyond the end of the vector and merely an artifact of the graphing software, not 
really part of the spectrum.) 


The thing to note here is that by looking at the spectrum we can see multiples of 
the frequency in the form cf, for c= 1,2,3,...,7 and from that, deduce f and from 
f, get T. 


21.6 A Cost-Benefit Preview of the FFT 


21.6.1 Benefit 


Although merely a computational technique to speed up the DFT, the fast Fourier 
transform, or FFT, is well worthy of our attention. Among its accolades, 


1. it is short and easy to derive, 
2. it has wide application to circuit and algorithm design, 


3. it improves the DFT’s O(N?) to O(Nlog N), a speed up that changes the 
computational time of some large arrays from over an hour to a fraction of a 
second, and 


4. the recursive nature of the solution sets the stage for studying the quantum 
Fourier transform (QFT) circuit. 


21.6.2 Cost 


A slightly sticky requirement is that it only operates on vectors which have exactly 
N = 2” components, but there are easy work-arounds. One is to simply pad a 
deficient f with enough Os to bring it up to the next power-of-two. For example, we 
might upgrade a 5-vector to an 8-vector like so: 


fo 
h 
: fs 
h, =. ie 4 
fi : 
0 


591 


Therefore, until further notice, we assume that N = 2”, for some n > 0. 


21.7 Recursive Equation for FFT 


The development of a truly fast FFT proceeds in two stages. In this section, we 
develop a recursive relation for the algorithm. In the next section, we show how 
to turn that into a non-recursive, i.e., iterative, solution which is where the actual 
speed-up happens. 


21.7.1 Splitting f into f°" and f° 


Given a vector f = (fx) of dimension N = 2”, divide it notationally into two vectors 
of size N/2 called fev" and f°%, defined by 


fo fi 
fo fs 
foe = fa and foun = fs 
tne fnjo4i 
In terms of coordinates, 
ie — fo 
and 
a = fory, 
fork = 0,1, ..., N/2 — 1. This leads to a string of equalities. 
, Xa 
DFT [fl]; = =) fro® 
N 425 
ip ee es 
= S- govt uy I (2k) as S- yo gy J 2k+1) 
VN k=0 k=0 
oe ee 
= Ss" fon —j(2k) 4s. yg oy ae yy J (2k) 
VN k=0 k=0 
1 Y-1 X-1 
—jk _ —jk 
= fon (w*) J the. aged > fond (w?) 
VN k=0 k=0 


We'll rewrite this to our advantage, next. 


592 


21.7.2. Nth Order DFT in Terms of Two (N/2)th Order DFTs 


Now, w = wy was our primitive Nth root of unity, so w? is an (N/2)th root of unity 
(c.g., Ya)” = Wa). If we rewrite the final sum using w! for w®, N’ for N/2 and 
labeling the w outside the odd sum as an Nth root, things look very interesting: 


DFT |f]; = DE: even } wy? i odd w") 
Tw 


We recognize each sum on the RHS as an order N/2 DFT, so let’s go ahead and 
label them as such, using an exponent label to help identify the orders. We get 


1 even -—j oO 
ree fl ae ey BETO? |) 


DFT FI; 
Since the j on the LHS can go from 0 + (N — 1), while both DFTs on the RHS are 
only size N/2, we have to remind ourselves that we consider all these functions to be 
periodic when convenient. 


Start of Side-Trip 


Let’s clarify that last statement by following a short sequence of maneu- 
vers. First turn an N/2 dimensional vector, call it g, into a “periodic 
vector” that is N — or even infinite — dimensional by assigning values to 
the “excess” coordinates based on the original N/2 with the help of an 
old trick, 


gp PIN 2). = G0). 


This effectively turns a function g defined only on the bounded domain 
Zy into a periodic function of all integers Z. 

Next, we prove that the definition of DFT implies that the order-N/2 
spectrum of a N/2-periodic g, DFT/?(q), is also periodic with period 
N/2. 

[Exercise. Prove the last claim by considering a j with N > j > N/2 
and evaluating DFT” [g], from the definition of an order N/2 DFT. 
Hint: Let 7 = N/2+p, where p < N/2. Then use the fact that for an 
N/2-order DFT, we use wy/2 as our root-of-unity. Reduce the resulting 


expression from the observation that (w ey a2, That provides a kind 
of N/2-periodicity of the N/2-order DFT over the larger interval |0, N]. 
Close the deal by extending the logic to larger and larger intervals. | 


593 


End of Side-Trip 


The upshot is that the 7 on the RHS can always be taken modulo the size of the 
vectors on the RHS, N/2. We’ll add that detail for utter clarity: 
1 


-j (N/2) [ godd 
a de Se DIE Lf le mod an) 
Finally, let’s clear the smoke using shorthand like F™) = DFT)(f), Fe = Fev 
and Fo = F°4. The final form is due to Danielson and Lanczos, and dates back to 
1942. 


21.7.3. Danielson-Lanczos Recursion Relation 


1 
ry, ([Fe”) Low. [Ry ) 
| IF /2 3G mod N/2) i ONG oa N/D) 


We have reduced the computation of a size N DFT to that of two size (N/2) DFTs 
(and a constant time multiplication and addition). Because we can do this all the 
way down to a size 1 DFT (which is just the identity operation — check it out), we 
are able to compute F} in log N iterations, each one a small, fixed number of complex 
additions and multiplications. 


We’re Not Quite There Yet 


This is promising: We have to compute N output coordinates, and Danielson-Lanczos 
tells us that we can get each one using, what appears to be log N operations, so it 
seems like we have an O(N log N) algorithm. 


Not so fast (literally and figuratively). 


1. The cost of partitioning f into f°’" and f°", unfortunately, does require us 
running through the full array at each recursion level, so that’s a deal breaker. 


2. We can fix the above by passing the array “in-place” and just adding a couple 
parameters, START and GAP, down each recursion level. This obviates the need 
to partition the arrays, but, each time we compute a single output value, we still 
end up accessing and adding all N of the original elements (do a little example 
to compute F3 for an 8-element f). 


3. Recursion has its costs, as any computer science student well knows, and there 
are many internal expenses that can ruin our efficiency even if we manage to 
fix the above two items, yet refuse to abandon recursion. 


In fact, it took some 20 years before someone (Tukey and Cooley get the credit) 
figured out how to leverage this recursion relation to break theO(N7?) barrier. 


594 


21.7.4 Code Samples: Recursive Algorithm 


Before we present a non-recursive algorithm based on Tukey and Cooley’s work, we 
should confirm that the recursive algorithm actually works and has expected time 
complexity of N?. 


We'll use an object-oriented (OOP) solution written in C++ whose main utility 
class will be FftUtil. A single FftUtil object will have its own FFT size, provided 
either at instantiation (by a constructor) or during mid-life (by an instance method 
setSize()). Once a size is established all the roots of unity are stored inside the 
object so that subsequent computations using that same object do not require root 
calculations. We’ll have the obvious accessors, mutators and computational methods 
in the class, as demonstrated by a simple client, shown next. Because many of the 
aspects of the class are independent of the details of the computational algorithm, we 
can use this class for the up-coming non-recursive, NlogN, solution. 


Not all the capabilities of the class will be demonstrated, but you'll get the idea. 


First, the client’s view. 


A Client’s View of Class FftUtil which Utilizes the Recursive FF7 


// client’s use of the classes FftUtil and Complex 
// to compute a recursive-style FFT 


Ph Client. Homa SSSR Saar Sea a oes ses 
int main() 
4. 


const int FFT_SIZE = 8; 

int k; 

bool const TIMING = false; 

Complex *a = new Complex[FFT_SIZE]; 


// load input array a 
for ( k = 0; k < FFT_SIZE; k++ ) 
alk] = k * .1; 


// instantiate an fft object 
FftUtil fft(FFT_SIZE); 


// load the object’s input signal with a[] 
fft.setInSig(a, FFT_SIZE); 


// confirm that the object has the right input 
cout << "IN SIGNAL (recursive)  ---------------- "<< endl; 
cout << fft.toStringIn() << endl; 


// instruct object to compute its FFT using recursion 
fft.calcFftRecursive(); 


// display the object’s output 
cout << "FFT OUT (recursive)  ---------------- "<< endl; 
cout << fft.toStringOut() << endl; 


// release dynamic memory 
delete[] a; 


The client is invoking setInSig() and toStringOut() to transfer the signals be- 


595 


tween it and object and also calling calcFftRecursive() to do the actual DFT 
computation. The client also uses a simple eight-element input signal for testing, 


in = AO OB aoe, 


Although the class proves to be only a slow O(N?) solution, we should not shrug-off 
its details; as computer scientists, we need to have reliable benchmarks for future 
comparison and proof-of-correctness coding runs. Thus, we want to look inside class 
FftUtil and then test it. 


The publicly exposed calcFftRecursive() called by the client leans on a private 
helper not seen by the client: calcFftRecursive(). First the definition of the public 
member method: 


// public method that client uses to request FFT computation 
// assumes signal is loaded into private array inSig[] 


bool FftUtil::calcFftRecursive () 


{ 
int k; 
// check for fatal allocation errors 
if (inSig == NULL || outSig == NULL) 
return false; 
// calculate FFT(k) for each k using recursive helper method 
for ( k = 0; k < fftSize; k++ ) 
outSig[k] = (1/sqrt(fftSize)) 
* calcFftRecWorker(inSig, k, fftSize, 1, 1 ); 
return true; 
} 


Here’s the private recursive helper which you should compare with the Danielson- 
Lanczos relation. It is a direct implementation which emerges naturally from that 
formula. 


// private recursive method that does the work 


// a: array start location 

// gap: interval between consec a[] elements (so we can pass array in-place) 
// size: size of a[] in current iteration 

// rootPos: index in the (member) roots[] array where current omega stored 

// j: output index FFT we are computing 

// reverse: true if doing an inverse FFT 


Complex fftFHC(int n, Complex a[], Complex roots[], 
int j, int size, int rootPos, int gap, bool reverse ) 
{ 
Complex retVal, even, odd; 
int arrayPos, sizeNxtLevel, rootPosNxtLevel, gapNxtLev; 


// DFT of single element is itself 
if ( size == 1 ) 
return a[j * gap]; // since using orig arraym need gap 


// locals used for clarity 
sizeNxtLevel = size >> 1; 
rootPosNxtLevel = rootPos << 1; 
gapNxtLev = gap << 1; 


// two recursive calls 
even = fftFHC( n, a, roots, j % sizeNxtLevel, sizeNxtLevel, 


596 


rootPosNxtLevel, gapNxtLev, reverse ); 
odd = fftFHC( n, a + gap, roots, j % sizeNxtLevel, sizeNxtLevel, 
rootPosNxtLevel, gapNxtLev, reverse ); 


// put the even and odd results together to produce return for current call 
// (inverse FFT wants positive exponent, forward wants negative) 
if (reverse) 


arrayPos = (j * rootPos) 4 n ; // j * omega 
else 

arrayPos = (j * (n - rootPos)) %4n ; // -j * omega 
retVal = even + roots[arrayPos] * odd; 


return retVal; 


I'll let you analyze the code to show that is does predict O(N?) big-O timing, but we 
will verify it using benchmarks in the next few lines. 


The output confirms that we are getting the correct values. 


/* -SSSSSSSSeSsesaSSSssS5 sample run =============s==s=s==s==5 


FFT OUT (recursive) ---------------- 
0.989949 -0.141421+0.341421i -0.141421+0.141421i1 -0.141421+0.0585786i 
-0.141421 -0.141421-0.0585786i -0.141421-0.141421i1 -0.141421-0.341421i 


Press any key to continue 


And we loop through different input-array sizes to see how the time complexity shakes 
out: 


FFT size 1024 Recursive FFT: 0.015 seconds. 
FFT size 2048 Recursive FFT: 0.066 seconds. 
FFT size 4096 Recursive FFT: 0.259 seconds. 
FFT size 8192 Recursive FFT: 1.014 seconds. 
FFT size 16384 Recursive FFT: 4.076 seconds. 


The pattern is unmistakable: doubling the array size causes the time to grow four- 
fold. This is classic N? time complexity. Besides the growth rate, the absolute times 
(four seconds for a modest sized signal) are unacceptable. 


21.8 A Non-Recursive, N log N Solution that De- 
fines the FFT 


Strictly speaking the above code is not an FFT since it is, frankly, just not that 
fast. A true FFT is an algorithm that produces N log N performance based on 
non-recursive techniques. 


We now study the improved FFT algorithm which consists of two main phases, 
bit-reversal and iterative array-building. We'll gain insight into these two sub-algorithms 


597 


by previewing the very short high-level FFT code that invokes them. 


21.8.1 The High-Level FFT Method 


To class FftUtil, we add public and private instance methods that will compute the 
DFT using a non-recursive algorithm. The highest level public method will be the 
(forward) FFT, called calcFft(). It consists of three method calls: 


1. Bit Reversal. The first implements bit-reversal by creating a new “indexing 
window” into our data. That is, it generates a utility array of indexes that will 
help us reorder the input signal. This utility array is independent of the input 
signal, but it does depend on the FFT size, N. This phase has the overall 
effect of reordering the input signal. 


2. Iterative Array Building. The second applies an iterative algorithm that 
builds up from the newly ordered input array to ultimately replace it by the 
output DFT. 


3. Normalization. Finally, we multiply the result by a normalizing factor. 


Here’s a bird’s eye view of the method. 


// public method that client uses to request FFT computation 
// assumes signal is loaded into private array inSig[] 
bool FftUtil::calcFft() 


{ 
// throwing exception or single up-front test has slight appeal, but 
// following is safer for future changes to constituent methods 
if ( !copyInSigToAuxSigBitRev() ) 
return false; 
if ( !combineEvenOdd(false) ) // false = forward FFT 
return false; 
if ( !normalize() ) 
return false; 
return true; 
} 


21.8.2 Bit-Reversal 


The big break in our quest for speed comes when we recognize that the recursive 
algorithm leads — at its deepest nested level — to many tiny order-one arrays, and 
that happens after log N method calls. This is the end of the recursion at which point 
we compute each of these order-one DF 7 s, manually. It’s the infamous “escape valve” 
of recursion. But computing the DFT of those size-one arrays is trivial: 


DFT" ({e}) = {e}, 


598 


that is, the DFT of any single element array is itself (apply the definition). So we 
don’t really have to go all the way down to that level - there’s nothing to do there. 
(Those, the size one DFTs are already done, even if they are in a mixed up order 
in our input signal.) Instead we can halt recursion when we have size two arrays, at 
which point we compute the order two DFTs explicitly. Take a look: 


1 
pe ay ee 2 —])7) . , 
[Feton.on], = glo + (D> hy) 
for some p and q, gives us the jth component of the size two DFTs. i. E..OR 
represents one of the many order two DFTs that result from the recursion relation 
by taking increasingly smaller even and odd sub-arrays in our recursive descent from 
size N down to size two. 


The Plan: Knowing that the first order DFTs are just the original input signal’s 
array elements (whose exact positions are unclear at the moment), our plan is work 
not from the top down, recursively, but from the bottom with the original array 
values, and build-up from there. In other words, instead of recursing down from size 
N to size one, we iterate up from size one to size N. To do that we need to get the 
original signal, {f,}, in the right order in preparation for this rebuilding process. 


So our first task is to re-order the input array so that every pair f, and f, that 
we want to combine to get a size two DFT end up next to one another. While at 
it, we'll make sure that once they are computed, all szze two pairs which need to 
be combined to get the fourth order DFTs will also be adjacent, and so on. This 
reordering is called bit-reversal. The reason for that name will be apparent shortly. 


Let’s start with an input signal of size 8 = 2° that we wish to transform and define 
it such that it’ll be easy to track: 


ca = { 0, A, 2, sity A, Dy 6, nae 


We start at the top and, using the Danielson-Lanczos recursion relation, see how the 
original f decomposes into two four-element even-odd sets, then four two-element 
even-odd sets, and finally eight singleton sets. Figure 21.7 shows how we separate the 
original eight-element f) into as and £, each of length 4. 


599 


Figure 21.7: Going from an 8-element array to two 4-element arrays (one even and 
one odd) 


I'll select one of these, i: for further processing (figure 21.8). 


600 


Figure 21.8: Decomposing the even 4-element array to two 2-element arrays (one even 
and one odd) 


This time, for variety, we’ll recurse on the odd sub-array, f o (figure 21.9). 


fro 


for foo 


Figure 21.9: Breaking a two-element array into two singletons 
Remember, our goal is to re-order the input array to reposition pairs such as these 


next to one-another. The last picture told us that we want to see f2 repositioned so 
it is next to fg (figure 21.10). 


601 


Figure 21.10: Final positions of f should be next to fy. 


Once this happens, we are at the bottom of recursion. These one-element arrays 
are their own DFTs, and as such, are neither even nor odd. Each singleton is some 
Fo (more accurately, [Froz|o and[F'zco)o. 

Aside: Recall that we needed to take 7 (mod N/2) at each level, but when N/2 
is 1, anything mod 1 is 0, which is why we are correct in calling these Fo for different 
one-element F’ arrays (figure 21.11). 


(1) (1) 


Fror Foo 


— — + 


fal 


Figure 21.11: Singletons are their own DFTs 


If we had taken a different path, we’d have gotten a different pair. Doing it for 
all possibilities would reveal that we would want the original eight values paired as 
follows: 


ja => fe 
ja=——> fr 
to <— faa 
fi +> fs 


Now, we want more than just adjacent pairs; we’d like the two-element DFTs 
that they generate to also be next to one another. Each of these pairs has to be 
positioned properly with respect to the rest. Now is the time for us to stand on the 
shoulders of the giants who came before and write down the full ordering we seek. 
This is shown in figure 21.12. (Confirm that the above pairs are adjacent). 


What you are looking at in figure 21.12 is the bit-reversal arrangement. It’s so 
named because in order to get fg, say, into its correct position for transform building, 
we “reverse the bits” of the integer index 6 (relative to the size of the overall transform, 
8, which is three bits). It’s easier to see than say: 6 = 110 reversed is 011 = 3, and 
indeed you will find the original fg = .6 ends up in position 3 of the bit-reversed, 


602 


y Orly 


Figure 21.12: Bit-reversal reorders an eight-element array 


array. On the other hand, 5 = 101 is its own bit reversed index, so it should — and 
does — not change array positions. 


Bit Reversal Mapping for 3-bit Integers 


= 000 + O000= 0 
= 001 »% 100= 4 
= 010 »& O010= 2 
3= 011 »& 110= 6 
4= 100 » O01= 1 
S5= 101 » 101= 5 
6= 110 & Q1= 3 
ii) fe dit] “7 


Bit Reversal Code 


The Code. The bit reversal procedure uses a managing method, allocateBitRev() 
which calls a helper reverseOneInt(). First, a look at the methods: 


// called once for a given FFT size and not repeated for that size 
// produces array bitRev[] used to load input signal 
void FftUtil::allocateBitRev () 
al 
int k; 


if ( bitRev != NULL ) 
delete bitRev; 


bitRev = new int[fftSize]; 


for ( k = 0; k < fftSize; k++ ) 
bitRev[k] = reverseOneInt (k); 


603 


return; 


} 


int FftUtil::reverseOneInt(int inVal) 
{ 
int retVal, logSize, inValSave; 
inValSave = inVal; 


// inVal and retVal are array locations, and size of the array is fftSize 
retVal = 0; 
for (logSize = fftSize >> 2; logSize > 0; logSize >>= 1) 
{ 
retVal |= (inVal & 1); 
retVal <<= 1; 
inVal >>= 1; 


// adjusts for off-by-one last half of array 
if (inValSave >= (fftSize>>1)) 
retValt+t; 


return retVal; 


Time Complexity 


The driver method has a simple loop of size N making it O(.V). In that loop, it calls 
the helper, which careful inspection reveals to be O(log N): the loop in that helper is 
managed by the statement logSize >>= 1, which halves the size of array each pass, 
an action that always means log growth. Since this is a nesting of two loop the full 
complexity is O(N log NV). 

Maybe Constant Time? This is done in series with the second phase of FFT 
rebuilding so this complexity and that of the next phase do not multiply; we will take 
the slower of the two. But the story gets better. Bit reversal need only be done once 
for any size N and can be skipped when new FFTs of the same size are computed. 
It prepares a static array that is independent of the input signal. In that sense, it is 
really a constant time operation for a given FFT or order N. 


Either way you look at it, this preparation code won’t affect the final complexity 
since we’re about to see that the next phase is also O(N log N) and the normalization 
phase is only O(N) making the full algorithm O(N log N) whether or not we count 
bit reversal. 


21.8.3 Rebuilding from the Bit-Reversed Array 


We continue to use the Danielson-Lanczos recursion relation to help guide our next 
steps. Once the input array is bit-reversed, we need to build-up from there. Fig- 
ure 21.13 shows how we do this at the first level, using the singleton values to build 
the order-two DFTs. 


Note that since we are building a two-element DFT, we use the 2nd root of unity, 
a.k.a. “-1.” After we do this for all the pairs, thus replacing all the singletons with 


604 


(1) 


E EOE 
Figure 21.13: Ee = Fen 
g 
two-element DFTs, we repeat the process at the next level: we build the four-element 
arrays from these two-element arrays. Figure 21.14 shows the process on one of the 


two four-element arrays, F’ ey) This time, we are using the 4th root of unity, 7, as the 
multiplier of the odd term. 


605 


Figure 21.14: Fe] ad Beak 
j 


After the four-element F’ (a) we compute its companion four-element DFT, FY) 
then go to the next and final level, the computation of the full output array F). If 
the original signal were length 16, 32 or greater, we’d need to keep iterating this outer 
loop, building higher levels until we reached the full size of the input (and output) 
array. 


Bottom-Up Array-Building Code 


The implementation will contain loops that make it easier to “count” the bzg-O of 
this part, so first have a casual look at the class methods. 


The Code. The private method, combineEven0Odd(), which does all the hard 
work, uses an iterative approach that mirrors the diagrams. Before looking at the 
full definition, it helps to first imagine processing the pairs, only, which turns those 
singleton values into order-two DFTs. Recall, that for that first effort, we use —1 as 
the root of unity and either add or subtract, 


| EO i EOE i Gee) ( ) EOO ¢ teoalts 
ior 5 =U, 1, 


which replaces those eight one-element DF 7s with four two-element DFTs. The 
code to do that isn’t too bad. 


Two-Element DF 7s from Singletons: 


// this computes the DFT of length 2 from the DFTs of length 1 using 
// FO = £0 + f1 


606 


// F1 = £0 - f1 

// and does so, pairwise (after bit-reversal re-ordering). 

// It represents the first iteration of the loop, but has the concepts 

// of the recursive FFT formula. 

// the roots[0O], ... , roots[fftSize-1] are the nth roots in normal order 
// which implies that -1 would be found at position fftSize/2 


rootPos = fftSize/2; // identifies location of the -1 = omega in first pass 
for (base = 0; base < fftSize; base +=2) 
{ 

for(j = 0; j < 2; j++) 


arrayPos = (j * (fftSize - rootPos)) % fftSize ; // -j * omega 
outSig[base + j] = inSig[base] + roots[arrayPos] * inSig[base + 1]; 
+ 
+ 


If we were to apply only this code, we would be replacing the adjacent pairs (after 
bit-reversal) by their order-two DFTs. For an input signal 


rj ee — ae RN ec ae ec ca 
the bit reversal would produce 
POA. Th ah Bae 
and the the above loop would replace these by the order-two DFTs to produce 
{.4, —.4, .8, —.4, 6, —.4, 1, —.4}. 


(We are not normalizing by 1/WN until later.) 


In the second generation of the loop we would combine the order two DFTs to 
produce size four DFTs. This involves the following adjustments to the code: 


e base +=2 — base +=4 


e rootPos = fftSize/2 -— rootPos = fftSize/4 (locates 7 rather than 
—1) 


e inSig[base] —  inSig[base + (j % 2)] (original inSig[base] was 
equivalent to inSig[base + (j % 1)]),sincej % 1 = 0, always). 


e inSig[base + 1] — inSiglbase + 2+ (j % 2)] 
Peo 2 ed 


T’ll let you write out the second iteration of the code that will combine DFTs of 
length two and produce DFTs of length four. After doing that exercise, one can see 
that the literals 1, 2, 4, etc. should be turned into a variable, groupsize, over which 
we loop (by surrounding the above code in an outer groupsize-loop). The result 
would be the final method. 


Private Workhorse method combineEvenOdd(): 


607 


// private non-recursive method builds FFT from 2-elements up 
// reverse: true if doing an inverse FFT 


bool FftUtil::combineEvenOdd(bool reverse) 


{ 

// instance members of class ------------------- 
// outSig(], uxSigI[]: working member arrays 

// roots[]: precomputed roots of unity 
// f£tSize: size of current FFT 


// key local variables 


// rootPos: index in the roots[] array where current omega stored 
// j: output index FFT being computed 

// groupSize: power-of-2 size whose sub-FFT we are computing. 

// increases from 1 to fftSize, doubling each time 


int base, j, rootPos, arrayPos, groupSize, nextGroupStart; 
Complex omegaToJ; 


// check for fatal allocation problems 
if (outSig == NULL || auxSig == NULL || roots == NULL) 
return false; 


// process entire array in groups of 1, 2, 4, ... up to fftSize 
for (groupSize = 1; groupSize < fftSize; groupSize <<= 1) 
{ 
nextGroupStart = groupSize << 1; 
rootPos = fftSize / nextGroupStart; 
for (base = 0; base < fftSize; base += nextGroupStart) 
fi 
for(j = 0; j < nextGroupStart; j++) 
{ 


// allow forward and inverse fft 
if (reverse) 


arrayPos = (j * rootPos) % fftSize ; // j * omega 
else 

arrayPos = (j * (fftSize - rootPos)) % fftSize ; // -j * omega 
omegaToJ = roots[arrayPos]; 


// Danielson and Lanczos formula 
outSig[base + j] = auxSig[ base + (j % groupSize) ] 
+ omegaToJ * auxSig[ base + groupSize + (j % groupSize) ]; 


} 
// rather than be clever, copy array back to aux-input for next pass 


if ( (groupSize << 1) < fftSize ) 
copyOutSigToAuxSig(); 


return true; 


Time Complexity 


At first glance it might look like a triple nested loop, leading to some horrific cubic 
performance. Upon closer examination we are relieved to find that 


e the outer loop is a doubling of groupsize until it reaches N, essentially a log N 
proposition, and 


608 


e the two inner loops together constitute only N loop passes. 


Thus, array building is O(N log N), as we had hoped. 


21.8.4 Normalization 


To complete the trio, we have to write the normalize() method. It’s a simple linear 
complexity loop not adding to the growth of the algorithm. 


The Normalize Method 


bool FftUtil::normalize () 
{ 

double factor; 

int k; 


if (outSig == NULL) 
return false; 


factor 1. / sqrt (mSize); 

for (k O; k < mSize; k++) 
outSig[k] = outSig[k] * factor; 

return true; 


21.8.5 Overall Complexity 


We have three methods in series, and the most costly of them is O(N log N), making 
the entire FFT O(N log N). 


21.8.6 Software Testing 


The only thing left to do is time this and compare with the recursive approach. Here’s 
the output: 


FFT size 1024 Non-recursive FFT: O seconds. 

FFT size 2048 Non-recursive FFT: 0O seconds. 

FFT size 4096 Non-recursive FFT: 0.001 seconds. 
FFT size 8192 Non-recursive FFT: 0.001 seconds. 
FFT size 16384 Non-recursive FFT: 0.003 seconds. 
FFT size 32768 Non-recursive FFT: 0.007 seconds. 
FFT size 65536 Non-recursive FFT: 0.016 seconds. 


It is slightly slower than linear, the difference being the expected factor of logN 
(although we couldn’t tell that detail from the above times). Not only is this evidence 
of the NlogN time complexity, but it is orders of magnitude faster than the recursive 
algorithm (4.076 seconds vs. .003 seconds for a 16k array). We finally have our true 
cS oe oe 


This gives us more (far more) than enough classical background in order to tackle 
the quantum Fourier transform. 


609 


Chapter 22 


The Quantum Fourier Transform 


22.1 From Classical Fourier Theory to the OFT 


Today you'll meet the capstone of our four chapter sequence, the quantum Fourier 
transform, or OFT. 


Chapter 19 [Real Fourier Series - Complex Fourier Series] 
— 
Chapter 20 [Continuous Fourier Transform] 
— 
Chapter 21 [Discrete Fourier Transform — Fast Fourier Transform] 
— 


This Chapter [Quantum Fourier Transform] 


If you skipped the three earlier lessons, you can always refer to them on an as-needed 
basis. They contain every detail and definition about classical Fourier theory that 
will be cited today, all developed from scratch. Search the table of contents for any 
topic you want to review if you hit a snag. 


22.2 Definitions 


22.2.1 From C” to H,,) 


We know that the discrete Fourier transform of order N, or DFT (its order usually 
implied by context), is a special operator that takes an N-dimensional complex vector 
f = (f,) to another complex vector f = (f;). In symbols, 


DIT LC => (eae 
DFT (f) —> f, 


610 


or, if we want to emphasize the coordinates, 
DFT [(fx)] — (fi)- 


When we moved to the fast Fourier transform the only change (other than the im- 
plementation details) was that we required N = 2” be a power-of-2. Symbolically, 


FFT :C* —C", 
FFT (fx)] + (fj): 


We maintain this restriction on N and continue to take n to be log N (base 2 always 
implied). 
We'll build the definition of the QF7T atop of our firm foundation of the DFT, 
so we start with 
DFT <C" —+C”, 
a 2"th order mapping of C?” to itself and use that to define the 2”th order QFT 


acting on an nth order Hilbert space, 


OFT : TG) =? Hn) : 
(Order. The word “order,” when describing the DFT = DFT) means N, the 
dimension of the underlying space, while the same word, “order,” when applied to a 


tensor product space Hin) is n, the number of component, single qubit spaces in the 
product. The two “orders” are not the same: N = 2” or, equivalently, n = log N.| 


22.2.2 Approaches to Operator Definition 


We know that there are various ways to define a unitary operator, U, 
U: Hn) =z Hn) ‘ 
Among them, three are prominent. 


1. Describe the action of U on the N = 2” CBS kets, {|x)"} = {]0)",..., 
|N —1)"}, and extend linearly. 


2. Describe the action of U on an arbitrary |)" € Him), by expressing U |q)" in 
terms of its N = 2” complex coefficients cg, ..., Cn—1. 


3. Give the matrix for U in the standard basis, used to convert |q)"’s coefficients 
to U |w)"’s coefficients through matrix multiplication. 


In each case, we have to confirm that U is both linear and unitary (although in 
methods 1 and 3, we get linearity for free since it is built-into those definitions). 
Also, we only have to use one technique, since the other two can be surmised using 
linear algebra. However, if we use a different method than someone else to define the 
same operator, we had better check that our definition and theirs are equivalent. 


611 


22.2.3 Review of Hadamard Operator 


Let’s make sure we understand these concepts before defining the OFT by reprising 
an example from our past, the Hadamard operator. 


First Order Hadamard 


We used method 1 to define H = H®!, by 


1 
io) = OLD 
V2 
DM) 
VJ2 ’ 
and extended, linearly. We might have used method 2, this way: 
If |b) = a0) + BI), 


then Hp) = ("0 s (5) 


308) 


Finally, some authors use method 3 to define the matrix for H, 


ie <a 
Me = (i A). 


which can then be used to produce the formulas for either method 1 or method 2. 


H|1) 


Nth Order Hadamard 


Meanwhile, for the n-fold Hadamard, we didn’t need to define it: we just derived it 
using the laws of tensor products. We found (using method 1), that 


wm 21 
1 
Hay = () ce)". 
v2 d 
(Remember that x-y = x©y is the mod-2 pairing (a pseudo inner product) of the 
two bit strings.) 
Because it is so much easier to express operators on the CBS, { |)” }, than to say 


what it does on the complex amplitudes, we normally do it that way, just as we did 
with H®". Indeed, this is about how half of the authors define the OFT. 


Despite the sales pitch for method 1, I think it is more instructive to use method 2 
for the OFT, then afterward show how it can be done using method 1, being careful 
to show the equivalence of the two approaches. 


612 


22.2.4 Defining the OFT 


As long as we’re careful to check that our definition provides a linear and unitary 
transformation, we are free to use a state’s coordinates to define it. Consider a general 
state |W)" and its preferred basis amplitudes (a.k.a., coordinates or coefficients), 


Co 
Cc 

Wy > CM = | 2 |, Nam 
CN-1 


We describe how the order-N QFT = QFT? acts on the 2” coefficients, and this 
will define the QF 7 for any qubit in H,,). 


Concept Definition of the Order-N QFT 


If |)" <-> (ce)rco» 
then QFT™) |b)" «+ (G)iQ- 


In words, starting from |)", we form the vector of its amplitudes, (c,); we treat (cy) 
like an ordinary complex vector of size 2"; we take its DFT) to get another vector 
(cy); we declare the coefficients { ¢, } to be the amplitudes of our desired output state, 
OFT) |W)". The end. 


Expressing the Order of the OFT 


If we need it, we'll display the QF 7T’s order in the superscript with the notation 
OFT). Quantum computer scientists don’t usually specify the order in diagrams, 
so we'll often go with a plain “OFT” and remember that it operates on an order-n 
Hilbert space having dimension N = 2”. 


Explicit Definition of the Order-N QFT 


The concept definition expressed formulaically, says that 


N-1 


then OFT |W)” 


III 
Es) 
ot 
= 
3 


613 


We can really feel the Fourier transform concept when we view the states as complex 
vectors |)" = c = (cz) of size 2” to which we subject the standard DFT, 


2 


OFT |p)" [DFT (c)], ly)" - 


ad 
ll 
fon) 


The definition assumes the reader can compute a DFT, so we'd better unwind our 
definition by expressing the OFT explicitly. The yth coordinate of the output OFT 
is produced using 


(OFT |)", = %& = View", 


where w = wy is the primitive Nth root of unity. The yth coordinate can also be 
obtained using the the dot-with-the-basis-vector trick, 


[OFT |v)"], = “wl QFT |¥)", 


so you might see this notation, rather than the subscript, by physics-oriented authors. 
For example, it could appear in the definition — or even computation — of the yth 
coordinate of the OFT, 


N-1 
1 
"(y| OFT |p)” = ——) cow. 
p> 
We'll stick with the subscript notation, [QFT |w)"],, for now. 


Notation and Convention 


Hold the phone. Doesn’t the DFT have a negative root exponent, wI*? The way I 
defined it, yes. As I already said, there are two schools of thought regarding forward 
vs. reverse Fourier transforms and DFTs. I really prefer the negative exponent for the 
forward transform because it arises naturally when decomposing a function into its 
frequencies. But there is only one school when defining the OFT, and it has a positive 
root exponent in the forward direction (and negative in the reverse). Therefore, I have 
to switch conventions. 


[If you need more specifics, here are three options. (7) You can go back and define 
the forward DFT using positive exponent from the start ... OR... (ii) You can 
consider the appearance of “DFT” in the above expressions as motivational but rely 
only on the explicit formulas for the formal definition of the OFT without anxiety 
about the exponent’s sign difference ... OR... (#7) You can preserve our original DFT 
but modify the definition of QFT by replacing “DFT” with “DFT~'” everywhere 
in this section.] 


Anyway, we won’t be referring to the DFT, only the OFT — effective immediately 
—so the discrepancy starts and ends here. 


614 


Full Definition of OF 7 using Method 2 


You should also be ready to see the full output vector, QFT |W)", expanded along 
the CBS, 


N-1 
if Wi)" = Ste |e)" 
«z=0 
-1N-1 
then OFT |)" = 4S Seu ly)” 
VN 0 «=0 


[Exercise. Show how this is a consequence of the earlier definition. 


Typical Definition QF 7 Using Definition Method 1 


It’s more common in quantum computing to take the CBS route, and in that case 
the definition of the OFT would be 


N- 


QFT |2)" = on oy)” 


=0 


from which we would get the definition for arbitrary states by applying linearity to 
their CBS expansion. Our task at hand is to make sure 


1. the definitions are equivalent, and 


2. they produce a linear, unitary operator. 


Vetting a putative operator like this represents a necessary step in demonstrating 
the viability of our ideas, so it’s more than just an academic exercise. You may be 
doing this on your own operators some day. Imagine future students studying “the 
[your name here} transform”. It can happen. 


Equivalence of the Definitions 


Step 1) Agreement on CBS. We'll show that the coefficient definition, OFT ous, 
agrees with the typical CBS definition, OFT; on the CBS. Consider the CBS |)”. 


615 


It is a tall 2” component vector with a 1 in position 2: 


0 
0 
> 1 | <~— ath coefficient 
0 
0 
N-1 N-1 
= Seely: S Sha |k)” . 
k=0 k=0 
Now apply our definition to this ket and see where it leads: 
, Nowa 
OFT site |x)” = a Ck wk ly)” 
IN 520 B20 
fe WEEN) 
=) ae Ska w™ ly)” 
VN y=0 k=0 
N-1 
1 n 
= w |y) 


= (OF Tis) & QED 


Step 2) Linearity. We must also show that OFT ours is linear. Once we do that 
we'll know that both it and QF 7 ,, (linear by construction) not only agree on the 
CBS but are both linear, forcing the two to be equivalent over the entire Hn). 


The definition of QF T ours in its expanded form is 
N-1N- 
OFT |p)” = ee ae wi |y) , 
y=0 2=0 


which, on close inspection, we recognize as matrix multiplication of the vector (c,) 
by the matrix (w’”) (the factor of 1/WN, notwithstanding). If this is not obvious, 
here’s a “slo-mo” version of that statement. 


616 


sO 


OFT zie |)” ©) 


i Eee Ce Co 

1 ag ee C1 

— 24) 4 4 C2 
VN 1 w® w® 


Whenever an operator’s coordinates are transformed by matrix multiplication, the 
operator is linear. QED 


Step 3) Unitarity. Now that we have the matrix for the OFT, 


See ee 


we can use Mor7 to confirm that QFT is unitary: if the matrix is unitary, so is 
the operator. (Caution. This is only true when the basis in which the matrix is 
expressed is orthonormal, which { |)" } is.) We need to show that 


(Morr)' Morr = 1, 


so let’s take the dot product of row x of (Morr)! with column y of Morr: 


1 1 
* * Ww! 1 —2x Lx we 
Fe (1, wt), wy) ed yy wilh wt, we) | uu 
ee 1 
Si wky-2) = wy (Ox) = dey. QED 
k=0 


(The second-from-last identity is from Exercise D (roots-of-unity section) in the 
early lesson complex arithmetic.) 


22.3. Features of the OFT 


22.3.1 Shift Property 


The shift property of the DFT, 


DFT _Ik 
feet. HS 


when plugged into the definition of OF 7 results in a quantum translation invariance 


OFT |\t=-2) = 6? loys = BOF Tas 


_ il 
JN 


ll 
fos) 


y 


We lose the minus exponent of w because the QF 7 uses a positive exponent in its 
forward direction. 


22.3.2 A Comparison between QFT and H 


You may find it enlightening to see some similarities and differences between the 
Hadamard operator and the OFT. 


We are working in an N = 2"-dimensional Hilbert space, H(,). We always signify 
this on the Hadamard operator using a superscript, as in H®”. On the other hand, 
when we specify the order of the QF 7, we do so using the superscript (V), as in 
“OFT ™).” In what follows, I’ll continue to omit the superscript for the QFT initially 
and only bring it in when needed. 


How do these two operators compare on the CBS? 


nth Order Hadamard 


H®" |p) = (=) ey 


where x-y = x©y is the mod-2 dot product based on the individual binary digits in 
the base-2 representation of x and y. Now —1 = wy (e?"/? = e™ = —1, V), so let’s 
replace the —1, above, with its symbol as the square root of unity, 


HO" xy" = (4) Sar 


y=0 


618 


N-Dimensional OFT 


The definition of the Nth order OFT for N = 2” is 


ol 
1 
QFT™) |x)" = (=) wer. 
2) 3) Seb) 
y=0 
The differences between the two operators are now apparent. 


e The QFT uses a primitive 2"th root of unity, while H®" uses a square root of 
unity. 


e The exponent of the root-of-unity is an ordinary integer product for OFT, but 
is a mod-2 dot product for H®". 


OFT Equals H for N = 2 


A useful factoid is that when N = 2 (n = 1) both operators are the same. Look: 


2 1 a yx 
grr?) = (Fe) Sore 
= (I) + (Hh) 
_ fi eo 
Za », «£=l 


22.3.3 The Quantum Fourier Basis 


Any quantum gate is unitary by necessity (see the lecture on single qubits), and a 
unitary operator acting on an orthonormal basis produces another orthonormal basis. 
I'll restate the statute (from that lecture that provides the rigor for all this). 


Theorem (Basis Conversion Property). Jf U is a unitary operator 
and A is an orthonormal basis, then U (A), i.e., the image of vectors A 
under U, is another orthonormal basis, B. 


Applying QFT®@") to the preferred z-basis in Hn) will give us another basis for Hn). 
We'll write it as { |z)"}, where 


|Z)" = QFT?”) Baa 


We'll refer to this as the quantum Fourier basis or, once we are firmly back in purely 
quantum territory, simply as the Fourier basis or frequency basis. It will be needed 
in Shor’s period-finding algorithm. 


619 


The Take- Away 


Where we used H®” to our advantage for Simon’s (Z.)" - periodicity, we anticipate 
using the OFT to achieve a similar effect when we work on Shor’s ordinary integer 
periodicity. 


22.4 The OFT Circuit 


Going from an operator definition to a quantum circuit that has an efficient (polyno- 
mial growth complexity) circuit is always a challenge. We will approach the problem 
in bite-sized pieces. 


22.4.1 Notation 


We'll review and establish some notation applicable to an N = 2”-dimensional Hn). 


The CBS kets in an N-dimensional 1, are officially tensor products of the indi- 
vidual CBSs of the 2-dimensional qubit spaces, 
n-1 


lay > (tet) OSes) SS [ty 3) Oa. @ zo). = &) [te) 
k=0 


where |x;) is either |0) or |1) of the Ath qubit space. 


We index in decreasing order from 2,_, to % because we'll want the right-most 
bit to correspond to the least significant bit of the binary number 2,_1... 21%. 


One shorthand we have used in the past for this CBS is 
|Zn—1) |@n—2) ---|%1) |Zo) , 
but we also know two other common variations, the encoded decimal integer x, 
Ee ECO S aaa Deon 
and its binary representation, 
tet wa eee teegs. weeedO 1 


For example, for n = 3, 


0)? << » 000), 

1)? < + |001), 

2)° ¢-+ |010), 

BP. Se OIDs 

4)> «+ |100), 
and, in general, 

a)° <— |x%9%1X0). 


620 


We'll also use the natural consequence of this notation, 
n—-1 
k=0 


22.4.2 The Math that Leads to the Circuit: Factoring the 
OFT 


We begin with the definition of the OF 7T’s action on a CBS and gradually work 
towards the expression that will reveal the circuit. It will take a few screens, so here 
we go. (I’m going to move the constant 1/N to LHS to reduce syntax.) 


N-1 
wy)" 


(VN) OFT |x)" 


n-1 
zy) yp2* 
= ) k=0 [Y%n—1- +» Y1Yo) 


n-1 
cy,2* 
= w (60) a ) ree ae Y1Yo) 


(I displayed the order, N, explicitly in the LHS’s “QFT™),” something that will 
come in handy in about two screens.) To keep the equations from overwhelming us, 
let’s symbolize the inside product by 


n-1 
9k 
Ty = [Lo 
k=0 
providing a less intimidating 
N-1 
y=0 


To make this concrete, we pause to see what it looks like for n = 3. 
(v8) QFT |x)’ 


= TW, |000) + Wy. |001) + Wy.2|010) + T,.3|011) 


+ Wy.4|100) + WWz.5|101) + Wy6/110) + W,z.7|111) 


621 


Separating a DFT into even and odd sub-arrays led to the FFT algorithm, and we 
try that here in the hope of a similar profit. 


(v8) QFT |x)’ 
= Wyo |000) + Wz |010) + y.4|100) + 7z.6|110) 


+ WHeq 001) — Weg 01) + Wes (101) +. Wye 111) 


y-even group )> 
+  y-odd group 5°. 


Analysis of y-even Group 


The least significant bit, yo, of all terms in the y-even group is always 0, so for terms 
in this group, 


0 -0: 
ryo2 = wt 0-1 _ i 


2 2 
_ ryp2e TYR 2* 
My = [wre = [Po 
k=0 k=1 


Evidently, the 7zy in the y-even group can start the product at k = 1 rather than 
k = 0 since the k = 0 factor is 1. We rewrite the even sum with this new knowledge: 


so for even y we get 


y-even group )> 
= TW, |000) + W,y.2|010) + W,y.4|100) + Wy. |110) 
(72.0 |00) fe reel: Fe Ta 10) Tx [11) ) 10) 


7 


es 1 nt Iyom) | [0) 


y=0 
y even 


Now that |yo) = |0) has been factored from the sum we can run through the even y 
more efficiently by 


halving the y-sum from +. — an ‘ 


e replacing |y241) —> |yiyo) , 
shifting the k-product down-by-1 so 0G — iF , and 


replacing 2* —> 2+! , 


622 


(Take a little time to see why these adjustments make sense.) Applying the bullets 
gives 


y-even group )> 


= (> w oe 7) \0) 
: (>: (11 wy") ina) jo). 


y=0 \k=0 


There’s one final reduction to be made. While we successfully halved the size of y 
inside the kets from its original 0 + 7 range to the smaller interval 0 — 3, the x in 
the exponent still roams free in the original set. How do we get it to live in the same 
smaller world as y? The key lurks here: 


x k 

(w*) UR2 
The even sub-array rearrangement precipitated a 4th root of unity w? rather than 
the original 8th root w. This enables us to replace any x > 3 with x — 4, bringing it 


back into the 0 + 3 range without affecting the computed values. To see why, do the 
following short exercise. 


[Exercise. For 4 < x < 7 write xr =4+p, where 0 < p< 3. Plug 4+ pin for x 
in the above exponent and simplify, leveraging the fact that w = V/1 | 


The bottom line is that we can replace x with (2 mod 4) and the equality is still 
holds true, 


3 1 
a mod 4 gk 
y-even group Y= (>: 1 (w?)' 7 i) |0) . 
y=0 \k=0 


Using the same logic on general n, we get 


y-even group >> 


N/2-1 n—-2 ; 
x mod N/2 2 
- (yb wu (w?)' pare lYn-2Yn—3 ---Yo) | |0) . 
y=0 \k=0 
Compare with our latest retooling of the OFT, 
N-1 /n-1 . 
(VN) OFT) |x)” —= S- I ute? lYn—1 - - -Y1Yo) ; 
y=0 \k=0 


and we are encouraged to see the order-N/2 OFT staring us in the face, 
y-even group > = ( /% QFT“) | « mod (N/2) es) |O) . 


623 


Let’s make this as concise as possible by using to mean x mod (N/2). 


y-even group > = 


(/% QF [zye-D) Joy. 


We’ve expressed the even group as a OFT whose order N/2, half the original N. 
The scent of recursion (and success) is in the air. Now, let’s take a stab at the odd 


group. 


Analysis of y-odd Group 


The least significant bit, 
terms in this group, 


Yo 


yt yor” 


Tay 


yo, of the all terms in the y-odd group is always 1, so for 


— ae 
= wll = w, — so for odd y we get 
2 2 
= I] (ytyn2" — yt I] ytyne 
k=0 k=1 


We separated the factor w* from the rest so that we could start the product at k = 1 
to align our analysis with the y-even group, above. We rewrite the odd sum using 


this adjustment: 


y-odd group 5> 


Now that |yo) = |1) and w* have both been factored from the sum, we run through 


the odd y by 


Te. |001) + Wy.3|O011) + Wy.5|101) + W,y.7|111) 
(Tet (00) + Wp3/01) + Ws |10) + T»7 (1) ) 11) 


7 


a Se 


y=0 
y odd 


(11 ot lyom) | |1) 


e halving the y-sum from Sosa => ye , 


e replacing |y2y1) — |y1Yo) , 


e shifting the k-product down-by-1 so li; — Il , and 


e replacing 2* —+ 2*+1 , 


These bullets give us 


y-odd group 5> 


= (>: (11 oy) 7) IL) , 


y=0 \k=0 


624 


and we follow it by the same replacement, (x mod 4) — x that worked for the y-even 
group (and works here, too): 


y-odd group )> 


= Ww (>: (1 payemaone 7) jt) . 


y=0 k=0 


Using the same logic on general n, we get 


y-odd group )> 


pe (x d N/2) yp2* 
— yw S- I (wu?) mod N/2) yx ) rae no tage 11) 


y=0 k=0 


Once again, we are thrilled to see an (V/2)-Order OFT emerge from the fray, 
y-odd group )> = uw” ( /% OFT?) ia) 1). 


The Recursion Relation 


Combine the y-even group )> and y-odd group > to get 
(VN) OFT |x)" = y-even group >> +. y-odd group 5> 


- 
( 


The binomial |0) + w*|1) had to end up on the right of the sum because we were 
peeling off the least-significant |0) and |1) in the even-odd analysis; tensor products 
are not commutative. This detail leads to a slightly annoying but easily handled 
wrinkle in the end. You'll see. 


2) 


OF ay) (0) 
+ w? (/¥ QF ay) j1) 
PTO oO) (0): 4 AP |D)) 


I 


Dividing out the VN, using 2” for N and rearranging, we get an even clearer 
picture. 


OFT?) |x)” = OFTO™”) |Z)" (e — m 


Compare this with the Danielson-Lanczos Recursion Relation for the DFT which we 
turned into an FFT by unwinding recursion. In our current context it’s even easier, 
because we have only one, not two, recursive calls to unwind. 


625 


If we apply the same math to the lower-order OFT on the RHS and plug the 
result into last equation, using 7 for the mod (N/2), we find 


pyr = onrery ayrt (M+ bel) (10) + ae) 
OPT |e) OFT eee ( re: fa ; 


Now let recursion off its leash. Each iteration pulls a factor of (|0) + w%, |1)) out 


(and to the right) of the lower-dimensional OF T") until we get to OFT), which 
would be the final factor on the left, 


OFT) |x)" = n(* + Blt) 


First, admire the disappearance of those pesky x factors, so any anxiety about x 
mod N/k is now lifted. Next, note that the RHS is written in terms of different 
roots-of-unity, wo, W4, ..., Wan = w. However, they can all be written as powers of 
W=WQN, 


(n 


For example, when k = 1 we have wy = (—1), which is w? mes Using this, we write 


the Nth order OFT in terms of the Nth root-of-unity w, 


Joy) + wer's n) 
a . 


Interesting Observation. When N = 2 (n = 1), we know QFT = H, so the 
first factor — the last one that got peeled-off on the far left — is literally H |x9) (by 
that time we would be taking x mod 2 which is the least significant bit of x, 29). This 
final, far left factor is 


OPTS! lee ( 
k=1 


1 
H |t) = = ((-1)"" |0 if —1)"° [1) ). 
|zo) Ji (aia 0) (Spr) 
and explains why the |Z), |Z), etc. terms disappeared from the RHS; each factor 
gobbled up another qubit from the tensor product, and the final, smallest, QFT 
turns the remaining single qubit, |vo), into the Hadamard superposition. 


Notation and Discussion 


Products between kets are tensor products. Each factor in the final product expressing 
OFT" is a single superposition qubit consisting of a mixture of |0) and |1) in its 
own component space. Meanwhile, there are n of those small superpositions creating 
the final tensor product. It is sometimes written 


n-1 


OFT) |x)” = @(" toes | 


k=0 


626 


I won’t use this notation, since it scares people, and when you multiply kets by 
kets everyone knows that tensors are implied. So, I'll use the |] notation for all 
products, and you can infuse the tensor interpretation mentally when you see that 
the components are all qubits. 


However, it’s still worth remarking that this 7s a tensor product of kets from the 
individual 2-dimensional H spaces (of which there are n) and as such results in a 
separable state in the N-dimensional H(,). This is a special way — different from the 
expansion along the CBS — to express a state in this high dimensional Hilbert space. 
But you should not be left with the impression that we were entitled to find a factored 
representation. Most states in H(,) cannot be factored — they’re not separable. The 
result we derived is that when taking the QF 7 of a CBS we happily end up with a 
separable state. 


The factored representation and the CBS expansion each give different information 
about the output state, and it may not always be obvious how the coefficients or 
factors of the two relate (without doing the math). 


A simple example is the equivalence of a factored representation and the CBS 
expansion of the following |b)? in a two qubit system (n = 2, N = 4): 
wy? = OD IDI _ gy @ + 1) 
V2 V2 V2 
Here we have both the CBS definition of QFT |x) and the separable view. 


In the N-dimensional case, the two different forms can be shown side-by-side, 


2h. 


Bi a te Le al) ee ee kcal P 
OFT?) |g)” = [ = Ww Jy). 


Of course, there can only be (at most) n factors in the separable factorization, while 
there will be up to 2” terms in the CBS expansion. 


The reason I bring this up is that the (separable) factorization is more relevant 
to the OFT than it was to the FFT because we are basing our quantum work on 
the supposition that there will be quantum gates in the near future. These gates 
are unitary operators applied to the input CBS qubit-by-qubit, which is essentially a 
tensor product construction. 


Let’s see how we can construct an actual OFT circuit from such unitary operators. 


22.4.3 The OFT Circuit from the Math: n= 3 Case Study 
In the case of n = 3, the factored QFT is 
OFT |x)° 


e (f tt) (* at) (* a). 


627 


Good things come to those who calmly examine each factor, separately. Work from 
left (most-significant output qubit) to right (least-significant output qubit). 


The First (Most-Significant) Output Factor 


We already know that this is H |x 9) but we’ll want to re-derive that fact in a way 
that can be used as a template for the other two factors. w is an 8th root of unity, 
so the coefficient of |1) in the numerator can be derived from 


wt = yt (4e2 + 2a1 + 20) (yee (881 (4x0 
1-1 (f)" = (-)™, 
which means 
(e + a) _ (" an srk) 
v2 v2 
pe, t= 1 


This was the most-significant qubit factor of output ket (the one on the far left of 
the product). Let’s refer to the output ket as |%) and its most significant separable 
factor (the one at the far left of our product) as |%2). We can then rewrite the last 
equation as 


|Z2) = H |x). 


[Don’t be lulled into thinking this is a computational basis element, though. Unlike 
the input state, |2) = |x2) |x1) |vo), which is a product of CBS, and therefore, itself a 
tensor CBS, the output, |) while a product of states, to be sure, is not comprised of 
factors which are CBS in their 2-D homes. Therefore, the product, |z), is not a CBS 
in the 2”-dimensional product space.| 


Summary: By expressing x as powers-of-2 in the most-significant output factor, 
|Z), we were able to watch the higher-powers dissolve because they turned w into 1. 
That left only the lowest power of w, namely wt = (—1), which, in turn, produced a 
“Hadamard effect” on the least significant bit of the input ket, |v ). We’ll do this for 
the other two factors with the sober acceptance that each time, fewer high powers 
will disappear. 


But first, let’s stand back and admire our handiwork. We have an actual circuit 
element that generates the most significant separable factor for OFT |x): 


|xo) A |Z2) Vv 


First, wow. Second, do you remember that I said pulling the least-significant kets 
toward the right during factorization would introduce a small wrinkle? You're looking 


628 


at it. The output state’s most-significant factor, |Z), is derived from the input state’s 
least significant ket, |z9). Make a mental note that this will necessitate a reversal of 
the kets once we have the output of the circuit; following the input line for qubit |) 
leads not to the output ket’s Oth factor, as we might have hoped, but rather to the 
output ket’s (n — 1)st factor. 


The Middle Factor 


Again, we remain aware that w is an 8th root of unity, making the coefficient of |1) 
in the numerator of the middle factor 


yet = 


yy? (Av2 + 2x1 + xo) 
1)" GA)" = (Pw 


yyee2 yy ttt yy2o 


which means 


|0) w* |1) |0) eheO 

(S - it ) _ (S + 4 1 ) 
Hei) Lo = 0 
(2 ~ eye 


’ Ig =1 


e The good news is that if x9 = 0, then the factor 7*° becomes 1 and we are left 
with a Hadamard operator applied to the middle input ket, |x). 


e The bad news is that if 7) = 1, we see no obvious improvement in the formula. 


Fixing the bad news of item 1 is where I need you to focus all your attention and 
patience, as it is the key to everything and takes only a few more neurons. Let’s go 
ahead and take H |x,), regardless of whether xo is 0 or 1. If x was 0, we guessed 
right, but if it was 1, what do we have to do to patch things up? Not much, it turns 
out. 


Let’s compare the actual state we computed (wrong, if 7) was 1) with the one we 
wanted (right, no matter what) and see how they differ. Writing them in coordinate 
form will do us a world of good. 


What we got when applying H to |) ... 


(1) 


... but if x» = 1, we really wanted: 


Bl) 


629 


How do we transform 


Answer: multiply by 


a 

III 
i 
or 
2. © 
NS 


Pp eerea: 


Now we have the more pleasant formula for the second factor, 


(” + iN) — fale), tp = 0 
V2 RiH\t1), tw=1 
A Piece of the Circuit 


We found that the two most-significant factors of the (happily separable) OFT |x) 
could be computed using the formulas 


|Z2) = H|zo), and 
i |x1) . Lo = 0 


lt) = 
R,H |21) 5 Lo = 1. 


In words: 


1. We apply H to the two least significant kets, |x) and |), unconditionally, since 
they will always be used in the computation of the final two most-significant 
factors of OFT |z). 


2. We conditionally apply another operator, R,, to the result of H |x,) in the 
eventuality that x) = 1. 


3. Although we apply all this to the two least significant input kets, |21) |xo), 
what we get is the most-significant portion of the output state’s factorization, 
|Z2) |%1) (not the least-significant, so we must be prepared to do some swapping 
before the day is done). 


Item 2 suggests using a controlled-R, gate, where bit xo is the control. If xo = 0, 
the operator being controlled is not applied, but if x») = 1, it is. Here’s the schematic 
for that piece: 


This leads to the following circuit element: 


eis R, 
|z0) 


630 


Let’s add the remaining components one-at-a-time. First, we want to apply the 
unconditional Hadamard gate to |x,). As our formulas indicate, this done before R1, 
(R,H |x,) is applied right-to-left). Adding this element, we get: 


|21) H Ry |v1) Vv 


20) 


This computes our final value for the |x) factor. 


We have yet to add back in the Hadamard applied to |x). Here the order is 
important. We have to make sure we do this after xo is used to control x;’s R, gate. 
Were we to apply H to xo before using it to control R,, x) would no longer be there — 
it would have been replaced by the Hadamard superposition. So we place the H-gate 
to the right of the control vertex: 


|21) H Ry |v1) Vv 


|Zo) | Hf \Z2) Vv 


That completes the circuit element for the two most-significant separable output 
factors. We can now get back to analyzing the logic of our instructional n = 3 case 
and see how we can incorporate the last of our three factors. 


The Last (Least-Significant) Factor 


The rightmost output factor, |%), has an w with no exponent in the numerator, 


w? = wi ws? wy" Uy 


(w4)? (w?)™ (wy = (-1)” ()* (w)”” 


4ro + 241 + 20) 4xq , 221 , x0 


which means 


(@ Sa a (" + er) 


7 (2 + | ies0 
(2 + cure wren) vig =a 


This time, while the output factor does not reduce to something as simple as a H |2) 
in any cases, when 29 = 0, it does look like the expression we had for the middle 
factor, except applied here to |x) |21) rather than the |) |a) of the middle factor. 
In other words, when xp = 0 this least significant factor reduces to 


(f 5 ae =) . 


631 


while the middle factor was in its entirety 


(e oF ae =) 


This suggests that, if 29 = 0, we apply the same exact logic to |22) |r,) that we used 
for |x) |v) in the middle case. That logic would be (if 2g = 0) 


(® + “iN _ ve < peo 

V2 RH |r2) ; vy = 1 
Therefore, in the special case where xo = 0, the circuit that works for |Z) looks like 
the one that worked for |z,), applied, this time, to qubits 1 and 2: 


|x) AH Ry 


|z1) | 


To patch this up, we have to adjust for the case in which xo = 1. The state we just 
generated with this circuit was 


iat) 


... but if 9 = 1, we really wanted: 


How do we transform 


ee =e ee wm -fo)” 


Answer: multiply by 
— 1 0 . 
m= (3%). 
1 0 1 S 1 
0 w) \-1%.- (i)* a —172- (4)™2-(W)* } * 


(Remember, this is in the case when xp = 1, sow = w*°.) This gives us the complete 
formula for the rightmost output factor, |Z), 


H |x) , Lp = 0, x; = 0 
RH |x2) 5 Ly = 0, Ly >= 1 


(f a a ey ae en eee eee 
V2 
RoH |x2) 5 Lo = i y= 0 


RoR, H |v) 5 Lo = 1, vy = 1 


In words, we took the tentative circuit that we designed for |% 9) under the assumption 
that 2) = 0 but tagged on a correction if x») = 1. That amounts to our newly 
introduced operator Ry controlled by xo, so that the result would be further multiplied 
by Ro. 

The Circuit Element for |% ), in Isolation 


As long as we only consider |), we can easily use this new information to patch 
up the most recent 2» = 0 case. We simply add an |x) at the bottom of the picture, 
and use it to control an R2 - gate, applied to the end of the |v2) — |Z) assembly line. 


|x2) H Ri Ry |Z) v 
|z1) | 
|Zo) © 


The Full Circuit for N = 8 (n = 3) 


In the previous section, we obtained the exact result for the least-significant output 
factor, |Zo). 


|r2) A Ry Ro |Z) ¥ 
|z1) 
|Zo) : 


In the section prior, we derived the circuit for output two most significant factors, 
|71) and |Z) 


|z1) H Ry |Z71) Vv 


|Zo) | a |Z2) Vv 


All that’s left to do is combine them. The precaution we take is to defer applying any 
operator to an input ket until after that ket has been used to control any R-gates 
needed by its siblings. That suggests that we place the |Z) |Z) circuit elements to 
the right of the |%) circuit element, and so we do. 


2) 4H} Ri Re To) ¥ 
2X1) | A Ry £1) v 
Xo) © | A £2) v 


633 


Prior to celebration, we have to symbolize the somewhat trivial circuitry for re- 
ordering the output. While trivial, it has a linear (in n = log N) cost, but it adds 
nothing to the time complexity, as we'll see. 


£9) Aki To) ¥ 
X1) | A Ry J 1) v 
Xo) od | A £0) v 


You are looking at the complete OFT circuit for a 3-qubit system. 


Before we leave this case, let’s make one notational observation. We defined R, 
to be the matrix that “patched-up” the |z,) factor, and R2 to be the matrix that 
“patched-up” the |Z) factor. Let’s look at those gates along with the only other gate 


we needed, H: 
= t. _ 1 0 
Mee ({ ies ( ) 


ne BC) 


The lower right-hand element of each matrix is a root-of-unity, so let’s see all three 
matrices again, this time with that lower right element expressed as a power of w: 


= 1 0 = at 5) 
R, = G °) R, = ( 2) 
cy, ot A 
CS a) 
This paves the way to generalizing to a OFT of any size. 


22.4.4 The OFT Circuit from the Math: General Case 


Minus the re-ordering component at the far right, here’s the OFT circuit we designed 
for = 3 


£2) A Ry Ry 


Zo) a | A 


634 


[Exercise. Go through the steps that got us this circuit, but add a fourth qubit to 
get the n = 4 (QFT) circuit.] 


It doesn’t take too much imagination to guess what the circuit would be for any 


Ini) | HH Rip Rep Ra-1 


ty) she A Ry fe Rn-2 


se ee 


a) 2s e a e - 4H 


That’s good and wonderful, but we have not defined R; for k > 2 yet. However, the 
final observation of the n = 3 case study suggested that it should be 


i 0 
Ry = & ve] - 


You can verify this by analyzing it formally, but it’s easiest to just look at the extreme 
cases. No matter what n is, we want R,’s lower-right element = 7, and R,_,’s to be 
w (compare our n = 3 case study directly above), and you can verify that for k = 1 
and k =n —1, that’s indeed what we get. 


Well, we have defined and designed the OFT circuit out of small, unitary gates. 
Since you'll be using QF7T in a number of circuit designs, you need to be able to cite 
its computational complexity, which we do now. 


22.5 Computational Complexity of OFT 


This is the easy part because we have the circuit diagrams to lean on. 


Each gate is a single unitary operator. Some are 2-qubit gates (the controlled-R, 
gates) and some are single qubit gates (the H gates). But they are all constant time 
and constant size, so we just add them up. 


The topmost input line starting at |,,_1) has n gates (count the two-qubit controlled- 
Ry,s as single gates in this top line, but you don’t have to count their control nodes 
when you get to them in the lines, below). As we move down to the lower lines 
observe that each line has one-fewer gates than the one above until we get to the final 
line, starting at |xo), which has only one gate. That’s 


~ 1 
S- a mnt) gates. 
k=1 


635 


The circuit complexity for this is O(n”). Adding on a circuit that reverses the order 
can only add an additional O(n) gates, but in series, not nested, so that does not 
affect the circuit complexity. Therefore, we are left with a computational factor of 


O(n?) = O(log? N). 


You might be tempted to compare this with the O(N log N) performance of the FFT, 
but that’s not really an apples-to-apples comparison, for several reasons: 


1. Our circuit computes OFT |x)” for only one of the N = 2” basis states. We’d 
have to account for the algorithm time required to repeat N passes through 
the circuit which (simplistically) brings it to O(N log? N). While this can be 
improved, the point remains: our result above would have to by multiplied by 
something to account for all N output basis states. 


2. If we were thinking of using the OFT to compute the DFT, we’d need to 
calculate the N complex amplitudes, {¢,} from the inputs {c,}. They don’t 
appear in our analysis because we implicitly considered the special NV amplitudes 
tap to: that define the CBS. Fixing this is feels like O(.V) proposition. 


3. Even if we could repair the above with clever redesign, we have the biggest 
obstacle: the output coefficients — which hold the DFT information — are am- 
plitudes. Measuring the output state collapses them destroying their quantum 
superposition. 


Although we cannot (yet) use the QFT to directly compute a DFT with growth 
smaller than the FFT, we can still use it to our advantage in quantum circuits, as 
we will soon discover. 


[Accounting for Precision. You may worry that increasingly precise matrix 
multiplies will be needed in the 2 x 2 unitary matrices as n increases. This is a valid 
concern. Fixed precision will only get us so far until our ability to generate and com- 
pute with increasingly higher roots-of-unity will be tapped out. So the constant time 
unitary gates are relative to, or “above,” the primitive complex (or real) multiplica- 
tions and additions. We would have to make some design choices to either limit n 
to a maximum useable size or else account for these primitive arithmetic operations 
in the circuit complexity. We'll take the first option: our n will remain below some 
maximum no, build our circuitry and algorithm to be able to handle adequate pre- 
cision for that no and all n < no. This isn’t so hard to do in our current problem 
since n never gets too big: n = log, N, where N is the true size of our problem, so we 
won't be needing arbitrarily large ns in practice. If that doesn’t work for us in some 
future problem, we can toss in the extra complexity factors and they will usually still 
produce satisfactory polynomial big-Os for problems that are classically exponential. ] 


Further Improvements 


Pll finish by mentioning, without protracted analysis, a couple ways the above 
circuits can be simplified and/or accelerated: 


636 


e If we are willing to destroy the quantum states and measure the output qubits 
immediately after performing the OFT (something we are willing to do in most 
of our algorithms), then the two-qubit (controlled-R;) gates can be replaced 
with 1-qubit gates. This is based on the idea that, rather than construct a 
controlled-R;, gate, we instead measure the controlling qubit first and then apply 
R, based on the outcome of that measurement. This sounds suspicious, I know: 
we’re still doing a conditional application of a gate. However, a controlled-R;, 
does not destroy the controlling qubit and contains all the conditional logic 
inside the quantum gate, whereas measuring a qubit and then applying a 1- 
qubit gate based on its outcome, moves the controlling aspect from inside the 
quantum gate to the outer classical logic. It is much easier to build stable 
conditioned one-qubit gates than two-qubit controlled gates. Do note, however, 
that this does not improve the computational complexity. 


e If m-bit accuracy is enough, where m < n, then we get improved complex- 
ity. In this case, we just ignore the least-significant (n — m) output qubits. 
That amounts to tossing out the top (n — m) lines, leaving only m channels to 
compute. The new complexity is now O(m?) rather than O(n”). 


637 


Chapter 23 


Shor’s Algorithm 


23.1 The Role of Shor’s Algorithms in Computing 


23.1.1 Context for The Algorithms 


Shor’s algorithms are the crown jewels of elementary quantum information theory. 
They demonstrate that a quantum computer, once realized, will be able handle some 
practical applications that are beyond the reach of the fastest existing super comput- 
ers, the most dramatic being the factoring of large numbers. 


As you may know from news sources or academic reports, the inability of com- 
puters to factor astronomically large integers on a human timescale is the key to 
RSA encryption, and RSA encryption secures the Internet. Shor’s quantum factoring 
algorithm should be able to solve the problem in minutes or seconds and would be a 
disruptive technology should a quantum computer be designed and programmed to 
implement it. Meanwhile, quantum encryption, an advanced topic that we study in 
the next course, offers a possible alternative to Internet security that could replace 
RSA when the time comes. 


23.1.2 Period Finding and Factoring 


There are many ways to present Shor’s results, and it is easy to become confused 
about what they all say. We'll try to make things understandable by dividing our 
study into two parts adumbrated by two observations. 


1. Shor’s algorithm for period-finding is a relativized (read “not absolute”) expo- 
nential speed-up over a classical counterpart. Like Simon’s algorithm, there are 
periodic functions that do not have polynomial-fast oracles. In those cases the 
polynomial complexity of the {circuit + algorithm} around the oracle will not 
help. 


2. Shor’s algorithm for factoring not only makes use of the period-finding algo- 


638 


rithm, but also provides an oracle for a specific function that 7s polynomial-time, 
making the entire {circuit + algorithm} an absolute exponential speed-up over 
the classical version. 


23.1.3 The Period Finding Problem and its Key Idea 
Informal Description of Problem 


Shor’s period-finding algorithm seeks to find the period, a, of a periodic function f(k) 
whose domain is a finite set of integers, ie, k € Zy = {0,1,2,..., M-— 1}. 
Typically M can be very large; we can even consider the domain to be all of Z, 
although the Z), case will be shown to be an equivalent finitization that makes the 
problem tractable. [ll state this explicitly in a couple screens. Here, we want to get 
a general picture of the problem and the plan. 


First, for this to even make sense a has to be less than M so that such periodicity 
is “visible” by looking at the IV domain points available to us. 


Second, this kind of periodicity is more “traditional” than Simon’s, since here we 
use ordinary addition to describe the period a via f(x) = f(a + a) rather than the 
exotic mod-2 periodicity of Simon, in which f(x) = f(a @a). 


Shor’s Circuit Compared to Simon’s Circuit 


We'll see that the circuit and analysis have the same general framework as that of 
Simon’s algorithm with the main difference being a post-oracle OFT gate rather 
than a post-oracle Hadamard gate. What’s the big idea behind this change? At the 
highest level of analysis it is quite simple even if the details are thorny. 


If we put a maximally mixed superposition into the oracle as we have done for all 
previous algorithms (and will do here) then a post-oracle measurement in the standard 
basis should give all possible values of f with equal probabilities. That won’t do. We 
have to measure along a basis that will be biased toward giving information. For 
Simon, that happened to be the x-basis, thus the use of a final Hadamard operator. 
What is the right basis in which to measure when we are hoping to discover an integer 
function’s period? 


Why the QFT Will Help 


Our lectures on classical Fourier series and transforms had many ideas and conse- 
quences of which I’d like to revive two. 


1. Period, T, and frequency, f (not the function, f(x), but its frequency) are 
related by 


T-f =~ constant, 


639 


where the constant is usually 1 or 27 for continuous functions and the vector 
size, M, for discrete functions. 


2. If a discrete function is periodic, its spectrum DFT(f) will have values which 
are mostly small or zero except at domain points that are multiples of the 
frequency (See Figure 23.1). 


0 20 40 60 80 100 120 


Figure 23.1: The spectrum of a vector with period 8 and frequency 16 = 128/8 


In very broad — and slightly inaccurate — terms, this suggests we query the spectrum of 
our function, f(x), ascertain its fundamental frequency, m, (the first non-zero spike) 
and from it get the period, a = M/m. 


But what does it mean to “query the frequency?” That’s code for “take a post- 
oracle measurement in the Fourier basis.” We learned that measuring along a non- 
preferred basis is actually applying the operator that converts the preferred basis to 
the alternate basis, and for frequencies of periodic functions, that gate is none other 
than the OFT. 


Well this sounds easy, and while it may motivate the use of the OF7, figuring 
out how to use it and what to test — that will consume the next two weeks. 


How We’ll Set-Up the State Prior to Applying the OFT 


Another thing we saw in our Fourier lectures was that we get the cleanest, easiest-to- 
analyze spectrum of a periodic function when we start with a pure periodic function 
in the spatial (or time) domain, and then apply the transform. In the continuous case 
that was a sinusoid or exponential, e.g., sin 3x (Figure 23.2). 


f(x) Im[F(s)] 


Figure 23.2: sin(3x) and its spectrum 


In the discrete case it was a function that was zero everywhere except for a single 
k <a and its multiples, like 


640 


oooo o0 00000 00000 
oooooo ooo oocoocooco0o fo 
Pee eee oe oe ee 
oooooooooco oo cCoCcoC0o 0 
oooo o0c0 00 000 00000 
oooooooo0oco oo ocoooc0o fo 
oooo o0 00000 00000 


). 


Such overtly periodic vectors have DFTs in which all the non-zero frequencies in the 
spectrum have the same amplitudes as shown in Figure 23.3. 


0 20 40 60 80 100 120 


Figure 23.3: The spectrum of a purely periodic vector with period 8 and frequency 
16 = 128/8 


We will process our original function so that it produces a purely periodic cousin with 
the same period by 


1. putting a maximally mixed state into the oracle’s A register to enable quantum 
parallelism, and 


2. “conceptually” collapsing the superposition at the output of the oracle by taking 
a B register measurement and applying the generalized Born rule. 


This will leave a “pure” periodic vector in the A register which we can send through 
a post processing OFT gate and, finally, measure. We may have to do this more 
than once, thereby producing several measurements. To extract m, and thus a, from 
the measurements we will apply some beautiful mathematics. 


The Final Approach 


After first defining the kind of periodicity that Shor’s work addresses, we begin the 
final leg of our journey that will take us through some quantum and classical terrain. 
When we’re done, you will have completed the first phase in your study of quantum 
computation and will be ready to move on to more advanced topics. 


The math that accompanies Shor’s algorithms is significant; it spans areas as 
diverse as Fourier analysis, number theory, complex arithmetic and trigonometry. 


641 


We have covered each of these subjects completely. If you find yourself stuck on some 
detail, please search the table of contents in this volume for a pointer to the relevant 
section. 


23.2 Injective Periodicity 


23.2.1 Functions of the Integers 


The kind of periodicity Shor’s algorithm addresses can be expressed in terms of func- 
tions defined over all the integers, Z, or over just a finite group of integers like Zy,. 
The two definitions are equivalent, but it helps to define periodicity both ways so we 
can speak freely in either dialect. Here we’ll consider all integers, and in the next 
subsection we'll deal with Zj,. 


We've discussed ordinary periodicity, like that of a function sin x, as well as (Z2)” 
periodicity, studied in Simon’s algorithm. Shor’s periodicity is much closer to ordinary 
periodicity, but has one twist that gives it a pinch of Simon’s more exotic variety. 


A function defined on Z, 
f: Z— &, DiC De, 


is called periodic injective if there exists an integer a > 0 (called the 
period), such that 


for alla ~y in Z, we have 
f(z) = fy) <= y= 2+ka, some integer k. 


The term “injective” will be discussed shortly. Because of the if and only if (<=) in 
the definition, we don’t need to say “smallest” or “unique” for a. Those conditions 
follow naturally. 


Figure 23.4: Graph of two periods of a periodic injective function 


Caution: Some authors might call f “a-periodic” to make the period visible, 
omitting the reference to injectivity. Others, might just call f “periodic” and let you 
fend for yourselves. 

In theory, S, the range of f, can be any set: integers, sheep or neutrinos. It’s the 
periodicity that matters, not what the functional values are. Still, we will typically 
consider the range to be a subset of Z, most notably, Zyr (for some positive integer 
r). This will make our circuit analysis clearer. 


642 


23.2.2 Functions of the Group Zy 


A little consideration of the previous definition should convince you that any periodic 
injective function with period a can be confined to a finite subset of Z which contains 
the interval [0, a). To “feel” f’s periodicity, though, we’d want M to contain at least 
a few “copies of a” inside it, i.e., we would like M > 3a or M > 1000a. It helps if 
we assume that we do know such an M, even if we don’t know the period, a, and let 
f be defined on Zy,, rather than the larger Z. The definition of periodic injective in 
this setting would be as follows. 


A function defined on Zy,, 
‘a ; Zu —_ a S Cr Zn , 


is called periodic injective if there exists an integer a € Zyy (called the 
period), such that 


for alla #y in Zy, we have 


f(z) = fy) <= y= x2+ka, some integer k. 


As before, S can be any set, but we’ll be using Zor. 


The phrase 
“y a + ka” 


uses ordinary addition, not mod-M addition. We are not saying that we can let 
x+ka wrap around M back to 0,1,2,... and find more numbers that are part of 
the f(x) = f(y) club. For example, the function 


f(r) =2%8 


is periodic injective with period 8 according to either definition. However, when we 
look at the second definition and take, say, M = 20 as the integer that we proclaim 
to be larger than a defining the domain Zy9, then only the numbers {3, 11, 19} 
which are of the form 3+ k8 should be considered pre-images of 3. Were we were 
to allow 3+ k8 (mod 20), then the members of this club would grow (illogically) to 
{3, 11, 19, 7, 15}. This does not track; while f(3) = f(11) = f(19) = 3, it is 
not true of f(7), or f(15), both of which are 7. 


23.2.3 Discussion of Injective Periodicity 

The term injective is how mathematicians say “1-to-1”. Also, periodic injective is 
seen in the quantum computing literature, so I think its worth using (rather than the 
rag-tag 1-to-1-periodicity, which is a hyphen extravaganza). But what, exactly, are 


we claiming to be injective? A periodic function is patently non-injective, mapping 


643 


multiple domain points to the same image value. Where is there 1-to-l-ness? It 
derives from the following fact which is a direct consequence of the definition: 


The if-and-only-if (<>) condition in the definition of periodic injective implies 
that, when restricted to the set [0, a) = {0, 1, 2,..., a—1}, fis 1-to-1. The same 
is true of any set of <= a consecutive integers in the domain. 


[Exercise. Prove it.| 


ae ge ; ate ; 
0 a 2a 


Figure 23.5: Example of a periodic function that is not periodic injective 


This is the twist I promised had some kinship to Simon’s periodic functions. Recall 
that they were also 1-to-1 on each of their two disjoint cosets R and Q in that conver- 
sation. The property is required in our treatment of Shor’s period-finding as well; we 
must be able to partition f’s domain into disjoint sets on which f is 1-to-1. This is 
not to say that the property is necessary in order for Shor’s algorithm to work (there 
may be more general results that work for vanilla flavored periodicity). However, the 
majority of historical quantum period-finding proofs make use of injective periodic- 
ity, whether they call it that, or not. For factoring and encryption-breaking, that 
condition is always met. 

While we’re comparing Simon to Shor, let’s discuss a difference. In Simon’s case, 
if we found even one pair of elements, x’ 4 y’ with f(x’) = f(y’), then we knew 
a. However, the same cannot be said of Shor’s problem. All we would know in the 
current case is that x’ and y’ differ by a multiple of a, but we would know neither a 
nor the multiple. 


23.3 Shor’s Periodicity Problem 


Statement of Shor’s Periodicity Problem 


Let f: Zy — Z be injective periodic. 
Find a. 


The M in the problem statement is not about the periodicity (a describes that) as 
much as it is about how big the problem is; it gives us a bound on a. It is M which 
is used to measure the computational complexity of the algorithm; how does the 
{algorithm + circuit} grow as M gets larger? 


644 


Relativized vs. Absolute Speed-Up 


I feel compelled to say this again before we hit-the-road. Our goal is to produce a 
quantum algorithm that completes in polynomial time when its classical counterpart 
has exponential complexity. The time complexity of Shor’s pertod-finding algorithm 
will end up being limited by both the circuits/algorithms we build around our oracle 
Uy, as well as f, itself. So if f cannot be computed in polynomial time, at least 
by some quantum circuit, we won’t end up with an easy algorithm. We will prove 
all the circuits and algorithms around Uy to be of polynomial complexity, specifi- 
cally O(log? M). We will also show (in a future lesson) that the f needed for RSA 
encryption-breaking is O(log* M), so the factoring problem is quantum-easy compared 
with a hard problem using classical circuits/algorithms. 

The point is that there are two problems: period-finding, which has quantum 
relativized speed-up and factoring, which has quantum absolute speed-up. Our job is 
to make sure that the quantum machinery around Uy is “polynomial,” so that when 
f, itself, is polynomial (as is the case in factoring) we end up with a problem that is 
absolutely easy in the quantum realm. 


One Convenient Assumptions 
In the development of the algorithm it will also help to add the very weak assumption 
a< M/2, 


i.e., the periodicity cycles at least twice in the interval [0, M — 1]. 


M > 2a 


Figure 23.6: We add the weak assumption that 2(+) a-intervals fit into [0, /) 


In fact, when we apply Shor’s periodicity algorithm to RSA encryption-breaking or 
the factoring problem, this added assumption will automatically be satisfied. Usually, 
we have an even stronger assumption, namely that a << M. 


In the general case, we'll see that even if we are only guaranteed that a < .9999M/, 
we still end up proving that Shor’s problem is solvable in polynomial time by a 
quantum circuit. But there’s no reason to be so conservative. Even a < M/2 is 
overdoing it for practical purposes, yet it makes all of our estimates precise and easy 
to prove without any hand-waving. Also, it costs us nothing algorithmically, as we 
will also see. 


645 


le a a | 
+++ ------ ++ 
0 a 2a 3a 100000a 
| } 
Y 
M>a 


a= .9999M 


Figure 23.8: Our proof will also work for only one a interval in [0, 7) 


23.3.1 Definitions and Recasting the Problem 


As stated, the problem has relatively few moving parts: an unknown period, a, a 
known upper bound for a, M, and the injective periodicity property. To facilitate 
the circuit and algorithm, we’ll have to add a few more letters: n, N and r. Here are 
their definitions. 


A Power of 2, 2”, for the Domain Size 
Let n be the exponent that establishes closest power-of-2 above M?, 
gS a IE ee 


We’ll use the integer interval [0, 1, ..., 2” — 1] as our official domain for f, and we’ll 
let N be the actual power-of-2, 


N = 2”. 


Since M > 2a we are guaranteed that [0, N—1] will contain at least as many intervals 
of size a within it. 


[You’re worried about how to define f beyond the original domain limit M@? Stop 
worrying. It’s not our job to define f, just discover its period. We know that f is 
periodic with period a, even though we don’t know a yet. That means its definition 
can be extended to all of Z. So we can take any size domain we want. Stated another 
way, we assume our oracle can compute f(z) for any 2.] 


The reason for bracketing M? like this only becomes apparent as the plot unfolds. 
Don’t be intimidated into believing anyone could predict we would need these exact 


646 


0 a M N/2 M? N 
| ] 
period problem bracket 
bound for M2 


Figure 23.9: N = 2” chosen so (N/2, N] bracket M? 


bounds so early in the game. Even the pioneers certainly got to the end of the 
derivation and noticed that these limits would be needed, then came back and added 
them up front. That’s exactly how you will do it when you write up the quantum 
algorithms that you discover. 
The bracketing of M?, when written in terms of N, looks like 
N 


a <a ws N 


A Power of 2, 2”, for the Range Size 


Also, without loss of generality, we assume that the range of f is a subset of Zor for 
some sufficiently large r > 0, i-e., 
ra(f) © [0, 2"—1], also written 
0 = Fie <2. 


Summarizing all this, 


f:[0, 2"-1] 
f:[0, N-1] 


— [0, 2"—1], also written 
— [0, 27-1], N=2". 
The Equivalence of N and M in Time Complexity 


Let’s be certain that using N for the time complexity is the same as using MW. Taking 
the log (base 2) of the inequality that brackets M?, we get 


N 
log > < logM? < logN, or 


log N—log2 < 2logM < logN. 


The big-O of every expression in this equation will kill the constants and weaken < 
to <, producing 


O(logN) < O(logM) < O(logN), 


647 


and since the far left and far right are equal, they both equal middle, i.e. 
O(logN) = O(logM). 
Therefore, a growth rate of any polynomial function of these two will also be equal, 
O (log? N) = O(log? M). 


Thus, bracketing M? between N/2 and N allows us to use N to compute the com- 
plexity and later replace it with M. Specifically, we'll eventually compute a big-O of 
log? N for Shor’s algorithm, implying a complexity of log? M. 


23.3.2 The Zy — (Z2)" — CBS Connection 


We’re all full of energy and eager to start wiring this baby up, but there is one final 
precaution we should take lest we find ourselves adrift in a sea of math. On the one 
hand, the problem lives in the world of ordinary integer arithmetic. We are dealing 
with simple functions and sums like, 


f(18) = 6, and 
6+ 7 = 18. 


On the other hand, we will be working with a quantum circuit which relies on mod-2 
arithmetic, most notably the oracle’s B register output, 


ly 6 f(z)", 
or, potentially more confusing, ordinary arithmetic inside a ket, as in the expression 
|x + ja)”. 


There’s no need to panic. The simple rule is that when you see +, use ordinary 
addition and when you see @ use mod-2. 


e |y @ f(x))”. We're familiar with the mod-2 sum and its use inside a ket, 
especially when we are expressing U;’s target register. The only very minor 
adjustment we’ll need to make arises from the oracle’s B channel being r qubits 
wide (where 2” is f’s range size) instead of the same n qubits of the oracle’s A 
channel (where 2” is f’s domain size). We'll be careful when we come to that. 


e |x + ja)”. As for ordinary addition inside the kets, this will come about when 
we partition the domain into mutually exclusive “cosets,” a process that [ll 
describe shortly. The main thing to be aware of is that the sum must not extend 
beyond the dimension of the Hilbert space in which the ket lives, namely 2”. 
That’s necessary since an integer x inside a ket |x)” represents a CBS state, 
and there are only 2” of those, |0)", ... |2" — 1)". We’ll be sure to obey that 
rule, too. 


Okay, I’ve burdened you with eye protection, seat belts and other safety equip- 
ment, and I know youw’re bursting to start building something. Let’s begin. 


648 


23.4 Shor’s Quantum Circuit Overview and the 
Master Plan 


23.4.1 The Circuit 


The total circuit looks very much like Simon’s. 


0)” Ae QFT™) A 
Us 
7 (actual) 
|0)" 
a 
(conceptual) 


[Note: As with Simon, I suppressed the hatching of the quantum wires so as to 
produce a cleaner looking circuit. The A channel has n lines, and the B channel has 
r lines, as evinced by the kets and operators which are labeled with the “exponents” 
n, N and r.] 


There are two multi-dimensional registers, the upper A register, and the lower 
B register. A side-by-side comparison of Shor’s and Simon’s circuits reveals two 
differences: 


1. The post-oracle’s A register is processed by a quantum Fourier transform in- 
stead of a multi-order Hadamard gate. 


2. Less significant is the size of the B register. Rather than it being an n-fold tensor 
space of the same dimension, 2”, as the A register, it is an r-fold space, with 
a smaller dimension, 2”. This reflects the fact that we know f to be periodic, 
with period a, forcing the number of distinct image values of f to be exactly a 
because of injective periodicity. Well, a <M < M? < 2”, so there you have it. 
These images can be reassigned to fit into a vector space of dimension, smaller, 
usually much smaller, than A’s 2”. (Remember that we don’t care what the 
actual images are — sheep, neutrinos — so they may as well be 0,1,...,2”~1.) An 
exact value for r may be somewhat unclear at this point — all we know is that 
it need never be more than n, and we’d like to reserve the right to give it a 
different value by using a distinct variable name. 


23.4.2 The Plan 
Initial State Preparation 


We prepare the separable state |0)” @ |0)" as input. 


Data Channel. The top line uses a multi-dimensional Hadamard to turn its |0)” 
into a maximally mixed (and perfectly balanced) superposition that it passes on to 
the oracle’s top input, thus setting up quantum parallelism. 


649 


Target Channel. The bottom line forwards its |0)” directly on to the quantum 
oracle’s B register, a move that (we saw with Simon) anticipates an application of 
the generalized Born rule. 


At that point, we conceptually test the B register output, causing a collapse of 
both registers (Born rule). We'll analyze what’s left in the collapsed A register’s 
output, (with the help of a “re-organizing,” QFT gate). We’ll find that only a very 
small and special set of measurement results are likely. And like Simon’s algorithm, 
we may need more than one sampling of the circuit to get an adequate collection of 
useful outputs on the A-line, but it’ll come very quickly due to the probabilities. 


Strategy 


Up to now, I’ve been comparing Shor to Simon. There’s an irony, though, when we 
come to trying to understand the application of the final post-oracle, pre-measurement 
gate. It was quite difficult to give a simple reason why a final Hadamard gate did 
the trick for Simon’s algorithm (1 only alluded to a technical lemma back then.) But 
the need for a final OFT, as we have already seen, is quite easy to understand: 
we want the period, so we measure in the Fourier basis to get the fundamental fre- 
quency, m. Measuring in the Fourier basis means applying a z basis-to-Fourier basis 
transformation, i.e., a OFT. m gets us a and we go home early. 


One wrinkle rears its head when we look at the spectrum of a periodic function, 
even one that is pure in the sense described above. While the likely (or in some cases 
only) measurement possibilities may be limited to a small subset {cm}%2j where 
m = N/a is the frequency associated with the period a, we don’t know which cm we 
will measure; there are a of them and they are all about equally likely. You'll see why 
we should expect to get lucky. 


Figure 23.10: Eight highly probable measurement results, cm, for N = 128 and a = 8 


So, while we’ll know we have measured a multiple cm of the frequency, we won’t know 
which multiple. As it happens, if we are lucky enough to get a multiple c that has the 
bonus feature of being relatively prime (coprime) to a, we be able to use it to find a. 


A second wrinkle is that despite what I’ve led you to believe through my pictures, 
the likely measurements aren’t exact multiples cm of the frequency, m. Instead they 
will be a values y., for c = 0,1,--- ,(a@ — 1) which are very close to cm. We'll have 
to find out how to lock-on to the nearby cm associated with our measured, y-. Still, 
when we do, a c coprime to a will be the the most desirable multiple that will lead 
to a. 


650 


Two Forks 


As we proceed, we'll get to a fork in the road. If we take the right fork, we’ll find an 
easy option. However the left fork will require much more detailed math. That’s the 
hard option. In the easy case the spectrum measurement will yield an exact multiple, 
cm, of m that I spoke of above. The harder, general case, will give us only a y, close 
to cm. Then we'll have to earn our money and use some math to hop from the y, on 
which we landed to the nearby cm that we really want. 


That’s the plan. It may not be a perfect plan, but I think that’s what I like about 
it. 


23.5 The Circuit Breakdown 


We'll segment the circuit into the familiar sub-sections. 


|o)” Hee Or i A 
Us 
0) = (actual) 
0 Tr 
a 
(conceptual) 
v v V 
A B C 


Since many of the sections are identical to what we’ve done earlier, the analysis is 
also the same. However, I’ll repeat the discussion of those common parts to keep this 
lecture somewhat self-contained. 


23.6 Circuit Analysis Prior to Conceptual Mea- 
surement: Point B 


23.6.1 The Hadamard Preparation of the A register 


I suppose we would be well advised to make certain we know what the state looks like 
at access point A before we tackle point B, and that stage of the circuit is identical 
to Simon’s; it sets up quantum parallelism by producing a perfectly mixed entangled 


651 


state, enabling the oracle to act on f(a) for all possible x, simultaneously. 


lo)" a= oFT™) | 4 


0)" ! Z 


Hadamard, H®", in Hyp) 


Even though we are only going to apply the 2"-dimensional Hadamard gate to the 
simple input |0)”, let’s review the effect it has on any CBS |x)”. 


Ix)" He (4) 3 (-1)*” |)", 


where the dot product between vector x and vector y is the mod-2 dot product. When 


applied to |0)", reduces to 
gn—1 


0)" He (75) So bw”, 


or, returning to the usual computational basis notation |x)” for the summation, 
Qn] 


oe is (Js) by”. 


The output state of this Hadamard operator is the nth order a-basis CBS ket, |+)" = 
\0)', reminding us that Hadamard gates provide both quantum parallelism as well as 


a z ++ x basis conversion operator. 


23.6.2 The Quantum Oracle 


Next, we consider the oracle: 


lo)” ——{He OFT™ A 
Us 


eS 
Quantum Oracle 


652 


The only difference between this and Simon’s oracle is the width of the oracle’s B 
register. Today, it is r qubits wide, where r will be (typically) smaller than the n of 
the A register. 


Ie)” le)” 
Us 
10)" lo@ F(z)” = | F(2))" 
Iz)" 0)" fee)" |F(@))" 


23.6.3 The Quantum Oracle on Hadamard Superposition In- 
puts 


Next, we invoke linearity. 


We remarked in Simon’s circuit that the oracle’s B register output is not simply 
f applied to the superposition state, a nonsensical interpretation since f only has a 
meaning over its domain, Zy. Instead, we apply linearity to the maximally mixed 
superposition |0)'; going into the oracle’s top register and find that 


(=) y ley" joyr (=) 3 le)” |F(2))" , 


which we see is a weighted sum of separable products, all weights being equal to 
(1/\/2)". The headlines are these: 


e The output is a superposition of separable terms |)” ae the kind of sum 
the generalized Born rule needs, 


ey” = 0)" doy + UY Wade + ove + 2" 1) roma) - 


An A-measurement of “x” would imply a B-state collapse to its (normalized) 
partner, |f(z))”. 


e If we chose to measure the B register first, a similar collapse to its entangled 
A partner would result, but in that scenario there would be (approximately) 
N/a =m, not one, pre-images x for every f(x) value. We will study those 
details shortly. 


23.7 Fork-in-the Road: An Instructional Case Fol- 
lowed by the General Case 
At this point, we will split the discussion into two parallel analyses: 


653 


1. an instructional/easy case that has some unrealistic constraints on the period, 
a, and 


2. the general/difficult case. 


The easy case will give us the general framework that we will re-use in the general 
analysis. Don’t skip it. The general case requires that you have your mind already 
primed with the key steps from the easy case. 


23.8 Intermezzo — Notation for GCD and Coprime 


We'll be using two classical concepts heavily when we justify Shor’s quantum algo- 
rithm, and we’ll also need these result for time complexity estimation. 


Basic Notation 


We express that fact that one integer, c, divides another integer, a, evenly (i.e. with 
remainder 0) using the notation 


cla. 

Also, we will symbolize the non-negative integers using the notation 
Z>0- 

Now, assume we have two distinct non-negative integers, a and 8, i.e., 


a,bE€ Zso, a>b, 


The following definitions are fundamental in number theory. 
23.8.1 Greatest Common Divisor 
gcd(a,b) = largest integer, c, with cla and c|b. 


23.8.2 Coprime (Relatively Prime) 


If gcd(a,b) = 1 we say that a is coprime to b (or a and b are coprime). 
ad b is my shorthand for a 7s coprime to b. 


a ad b is my shorthand for a ts not coprime to b. 


654 


23.9 First Fork: Easy Case (a|N) 


We now consider the special case in which a is a divisor of N = 2” (in symbols, 
a|N ). This implies a = 2'. Immediately, we recognize that there’s really no need for 


a a a a 
I \l ' | | | 
/———_+——_+——+ ------- +——_ 
0 a 2a 3a (m—1)a ma 


\ — J 


| 
ma N= 2* 


Figure 23.11: Easy case covers a|N , exactly 


a quantum algorithm in this situation because we can test for periodicity using classi- 
cal means by simply trying 2! for 1 = 1, 2, 3, ...,(m—1), which constitutes O(log N) 
trials. In the case of factoring, to which we’ll apply period-finding in a later lecture, 
we'll see that each trial requires a computation of f(r) = y* (mod N) € O(log* N), 
y some constant (to be revealed later). So the classical approach is O(log N) relative 
to the oracle, and O(log’? N) absolute including the oracle, all without the help of a 
quantum circuit. However, the quantum algorithm in this easy case lays the founda- 
tion for the difficult case that follows, so we will develop it now and confirm that QC 
leads to at least O(log? N) classical complexity (we'll do a little better). 


23.9.1 Partitioning the Domain into Cosets 


It’s time to use the fact that f is periodic injective with (unknown) period a to help 
rewrite the output of the Oracle’s B register prior to the conceptual measurement. 
Injective-periodicity tells us that the domain can be partitioned (in more than one 
way) into many disjoint cosets of size a, each of which provides a 1-to-1 sub-domain 
for f. Furthermore, because we are in the easy case, these cosets fit exactly into the 
big interval [0, NV). Here’s how. 


cosets 7 
n pou" Boze R+(m—1)a 
a: an ae 
a a ee +— 
0 a 2a 3a (m—1)a N 


Figure 23.12: [0, NV) is the union of distinct cosets of size a 


655 


[0, N-—1] = [0, ma—-1] 


= |[0,a-1] U |a, 2a-1] U [2a, 3a-—1) --- U [(m—la, ma-]1|] 
= R U R+ta U R+2aq--- U R+ (m—-I)a, 
where 
R = [0,a-1) = {0, 1, 2, ...,a—-1}, 
a = period of f, and 
m = N/a is the number of times a divides N = 2”, 


Definition of Coset. R + ja is called the jth coset of R. 
We rewrite this decomposition relative to a typical element, x, in the base coset R. 
[(0, N-1] = {0,1,...a,...a-1} U {a,1l+a,... +a, ... 24-1} 
U {2a,1+2a,...2+2a,... 3a—1} U 


U {(m—l)a, 1+(m-—l)a, ... c+(m-—lja, ... ma—1} 


m-1 


= { x+Jja |xe|0,a) } 


23.9.2 Rewriting the Output of the Oracle 


It seems like a long time since we saw the original expression for oracle’s output, so 
let’s write it down again. It was 


Our new partition of the domain gives us a nice way to express this. Each element 
x € R has a unique partner in each of the cosets, R + ja, satisfying 


x 
r+a 
r+ 2a 
: f 
> f(x) 
x+Jja 
x+(m-—l)a 


Using this fact (and keeping in mind that N = 2”) we only need to sum over the a 
elements in R and include all the x + Ja siblings in each term’s A register factor, 


| 
S]- 
M4 
iM 
8 
+ 
= 
3 
= 
= 


I 
2/5 
M4 
ar 
tag) 
eS 
+ 
& 
= 
= 


I moved a factor of 1/,/m to the right of the outer sum so we could see that 


1. each term in that outer sum is a normalized state (there are m CBS terms in 
the inner sum, and each inner term has an amplitude of 1/,/m) and 


2. the common amplitude remaining on the outside produces a normalized state 
overall (there are a normalized terms in the outer sum, and each term has an 


amplitude of \/m/N = 1//a). 


23.9.3. Implication of a Hypothetical Measurement of the B 
register Output 


Although we won’t really need to do so, let’s imagine what happens if we were to 
apply the generalized Born rule now using the rearranged sum in which the B channel 
now plays the role of Born’s CBS factors and A channel holds the general factors. 


|0)" ——H®" OFT) A 
Us 
|o)" A 
Sa 
Conceptual 


As the last sum demonstrated, each B register measurement of f(a) will be attached 
to not one, but m, input A register states. Thus, measuring B first, while collapsing 
A, merely produces a superposition of m states in that register, not a single, unique 
x from the domain. It narrows things down, but not enough to measure, 


657 


jes Ja tse fle)!” CZ 


i oe es : 
\ €. d [zo + ja) f(Xo)) 


(Here, \, means collapses to.) 


If after measuring the post-oracle B register we were to go on to measure the A 
register, its resulting collapse would give us one of the m values, x9 + ja, but we 
would have no way to extract a from that measurement; there is no real information 
for us here, so we don’t measure A yet. 


Let’s name the collapsed — but unmeasured — superposition state in the A register 
xo), Since it is determined by the measurement “f(xo)” of the collapsed B register, 


m—-1 


leo)” Sai 2 Io + Jay” 


Motivation for Next Step 


Stand back and you'll see that we’ve accomplished one of the goals of our introductory 
“key idea” section. The conceptual measurement of the B register leaves an overall 
state in the A register in which all of the amplitudes are zero except for m that have 
equal amplitude —. Furthermore, those non-zero terms are spaced at intervals of 
a in the N-dimensional vector: this is a “pure” periodic vector with the same period 
a as our function f. We have produced a vector whose DFT is akin to that shown 
in Figure 23.13. All non-zero amplitudes of the DFT are multiples of the frequency 
m, i.e., of the form cm, c = 0,1, ..., (a—1). (Due to an artifact of the graphing 
software the 0 frequency appears after the array at phantom position N = 128.) 


Figure 23.13: The spectrum of a purely periodic vector with period 8 and frequency 
16 = 128/8 


The Big Picture 


This strongly suggests that we apply the OFT to the A register in order to produce 
a state that looks like Figure 23.13. 


658 


The Details 


We'll get our ideal result if we can produce an A register measurement, cm, where c 
is coprime to a. The following two thoughts will guide us. 


e The shift property of the OFT will turn the sum x+ ja into a product involving 
a root-of-unity, 


N- 
QFT™) |x —a9)” = wr me ww ly)” 
=0 


e Sums of roots-of-unity are famous for canceling themselves out to produce a 
“whole lotta 0.” 


23.9.4 Effect of a Final OF 7 on the A Register 


The much advertised QFT is applied to the conceptually semi-collapsed |q,,)”" at 
the output of the oracle’s A register: 


Oe OFT ™) +L Ax 
Us 
0)” A 


The OFT, being a linear operator, distributes over sums, so it passes right through 
the y, 


aoe FT) |x + ja)" 


m—-1 
= De ko + Ja)” 
— YF Jeo ja)” [arr ]}— 
vm = 
The OFT of each individual term is 


N-1 
- \n 1 x ja n 
OFT) |X + ja) = UN Sw! o+ja)y ly) 
y=0 


1 N-1 
oe (OY yyJy n 2 
Tm \y) 
y= 


659 


so the OFT of the entire collapsed superposition is 


OFT) |xo + ja)” 


1 m 1 N-1 
TOY yi 
Tm is 1)” 
; v0 


J 


N-1 m-1 
1 roy , jay 
geen Gye ly)" 
mN 
N-1 


1 


y=0 j=0 


- m-1 


Rather than invoke QF T’s shift property, we actually re-derived it in-place. 


Summary. This, then, is the superposition state — and our preferred organiza- 
tion of that expression — just prior to sampling the final A register output: 


1 N-1 m—1 
yr y Sow ay n 
Tame 2 ty) 
y=0 j=0 


We can measure it at any time, and we next look at what the probabilities say we 
will see when we do. 


Foregoing the B Register Measurement. Although we analyzed this under 
the assumption of a B measurement, an A channel measurement really doesn’t care 
about a “conceptual” B channel measurement. The reasoning is the same as in 
Simon’s algorithm. If we don’t measure B first, the oracle’s output must continue to 
carry the full entangled summation 


EX (AS ese") ver 


through the final |OF T™) @ 1°). This would add an extra outer-nested sum 


a ) le@y’ 


x€[0.a) 


to our “Summary” expression, above, making it the full oracle output, not just that 
of the A register. Even leaving B is unmeasured, the algebraic simplification we 
get below will still take place inside the big parentheses above for each x, and the 
probabilities won’t be affected. (Also, note that an A register collapse to one specific 
|vo + ja)” will implicitly select a unique | f(xo)) in the B register.) With this overview, 
try carrying this complete sum through the next section if you’d like to see its (non) 
effect on the outcome. 


660 


23.9.5 Computation of Final Measurement Probabilities (Easy 
Case) 


We are now in an excellent position to analyze this final A register superposition and 
see much of it disappear as a result of some of the properties of roots-of-unities that 
we covered in a past lecture. After that, we can analyze the probabilities which will 
lead to the algorithm. We proceed in five steps that will 


1. identify a special set of a elements, C = {y. = cm}e) of certain measurement 
likelihood, 


2. observe that each of y. = cm will be measured with equal likelihood, 


3. prove that a random selection from |0, a — 1] will be coprime-to-a 50% of the 
time, 


4. observe that a y = cm associated with c coprime-to-a will be measured with 
probability 1/2, and 


5. measure y = cm associated with c coprime-to-a with arbitrarily high confidence 
in constant time complexity. 


23.9.6 STEP I: Identify a Special Set of a Elements, C = 
{ycha9 of Certain Measurement Likelihood 


After (conceptual) measurement /collapse of the B register to state |f(a9))", the post- 
OFT A register was left in the state: 


m—1 N-1 m—-1 
ra bea + 3” an } ty" 
se D leo +50)” —forr™}— —— Dum |S lui) |) 
vn j=0 ed y=0 j=0 


We look at the inner sum in parentheses in a moment. First, let’s recap some facts 
about w. 


W=WN 
was our primitive Nth root of unity, so 
wY = 1, 


Because m is the number of times a divides (evenly) into N, 


m = N/a, 


661 


we conclude 


In other words, we have shown that 
me So 


is the primitive mth root of unity. Using w,, in place of w® in the above sum produces a 
form that we have seen before (lecture Complex Arithmetic for Quantum Computing, 
section Roots of Unity, exercise (d)), 


eA re m, ify=0 (mod m) 
Sum = San - 
j=0 j=0 0, if y xz 0 (mod m) 


This causes a vast quantity of the terms in the OFT output (the double sum) to 
disappear: only 1-in-m survives, 


m—1 m 
1 eh peo LOY ic 
ae (to + 5a)" —QFT™) }— N ere 
j=0 


Let’s think about how to quantify the property that y is 0 mod-m: 


y =0 (mod m) 
> 


y=cm,  forsomec = 0, l, 2,..., a—1. 
This defines the special set of size a which will be certain to contain our measured y: 
C= sony g: 


23.9.7 Step II: Observe that Each cm Will be Measured with 
Equal Likelihood 


Since the easy case assumes N = am, we know that \/m/N = ,/1/a, so the last 
equation becomes 


m—1 a-1 
1 ae 1 
SP leo +50)" fart} — rue Jom", 
m 
j=0 va c=0 


and we see that each cm is measured with probability 1/a. 


662 


Example 


Consider a function that has period 8 = 2° defined on a domain of size 128 = 2’. Our 
problem variables for this function become 

ee 

Ne So? S498 

i a OP Sea + Sd 

= — = 2 = £6. 

pe a 2 
Let’s say that we measured the B register and got the value f(xo) corresponding to 
Xo = 3. According to the above analysis the full pretested superposition, 


127 


ag DY” een” 


- 5 (I2)" + lo+8)’ + |o+16)? + --- + jx + 120)" ) f(a)” 


when subjected to a B register measurement will collapse to 


7 r 1 ( 7 7 7 7 r 
Bre Gu) eof aN egy! ot fg)" acne T3 ) 3)\" , 
ws)" |F(3)) Jig 3)" + [11)° + [19) |123)" ) |F(3)) 
leaving the following vector in the A register (which I have “stacked” so as to align 
the 16 non-zero coordinates): 


¢ 


oO 


oooooooooocococo0o fd 
oooooooooocococcoeo fd 
ooooooooooocoooo0o fo 
ooooooooooocooooco0o fo 


oooooocooocoooco0oc 0 
coooooooocooocaocaaaoCo fd 
oooooooooococooeaoacoco fo 


ye, 


We are interested in learning the period but can’t really look at the individual co- 
ordinates of this vector, it being a superposition which will collapse unpredictably. 
But the analysis tells us that its QF 7 will produce a vector with only eight non-zero 
amplitudes, 


/m -. 1 F 
OFT (8) |eh3)" = ri S- yy 20 ¥ ly) = V3 Ss" yyr0 ¥ ly)” 
( y=0 ' y=0 


mod m (mod 16) 


663 


each corresponding to a y which is a multiple of m = 16 having an amplitude-squared 
(i.e., probability of collapse) = 1/8 = .125, all other probabilities being zero. Graphing 
the square amplitudes of the OFT confirms this nicely. (See Figure 23.14.) So we do 


° 
ca 


0 20 40 60 80 100 120 
Figure 23.14: Eight probabilities, .125, of measuring a multiple of m = 16 


find that the only possible measurements in the frequency domain lie in the special 
set 


C = {0, 16, 2(16), 3(16), ..., 7(16)}. 


Match this with the graph to verify the spikes are at positions 16, 32, 48, etc. (Due 
to an artifact of the graphing software the 0 frequency appears after the array at 
phantom position 128.) If we can use this set to glean the frequency m = 16, we will 
be able to determine the period a = N/m = 128/16 = 8. 


Measuring along the frequency basis means applying the basis-transforming OFT 
after the oracle and explains its presence in the circuit. 


23.9.8 Step III: Prove that a Random Selection from |0, a—1| 
will be Coprime-to-a 50% of the Time 


Specifically, we prove that the probability of a randomly selected c € [0, a — 1] being 
coprime to a is 1/2 (in the easy case). 
Recall that, in this easy case, N = 2", and a|N, so, a = 2" must also be a 
power-of-two. 
_ # coprimes toa € [0, a—]1] 


P(e a) : 


a 


# odds < 2” . : 
= 5 ; sincea=2 
1 
=-. ED 
5° & 


23.9.9 Step IV: Observe that a y = cm Associated with c 
Coprime-to-a Will be Measured with Probability 1/2 


Our next goal will be to demonstrate that we not only measure some value in the 
set C = {cm} in constant time, but that we can expect to get a special subset B CC 


664 


in constant time, namely those cm corresponding to co a (i.e., c coprime to a). This 
will enable us to find m (details shortly), and once we have m we get the period a 
instantly from a = N/m. 


In Step I, we proved that the likelihood of measuring one of the special C = 
{yc} = {cm} was 100%. Then, in Steps II and III, we demonstrated that 


e each of the cm will be measured with equal likelihood, and 


e the probability of selecting a number that is coprime to a at random from the 
numbers between 0 and a — 1 is (in this special easy case) 1/2, i.e., a constant, 
independent of a. 


We combine all this with the help of a little probability theory. The derivation may 
seem overly formal given the simplicity of the probabilities in this easy case, but it 
sets the stage for the difficult case where we will certainly need the formalism. 


First we reprise some notation and introduce some new. 


C = {em}% 
B = {y|w=bmeC and boa} 
(Note: B CC.) 
P(C) = P(we measure some y € C) 
P(B) = P(we measure some y € B) 
P(B\|C) = P(we measure some y € B 


given that 


the measured y is known to be € C) 


We would like a lower bound (specifically, 1/2) on the probability of measuring a 
Ye = cm which also has the property that its associated c is coprime to a. In symbols, 
we would like to show: 


Claim. P(B) > _ 1/2, independent of a, M,N, etc. 
Proof. We know that we will measure a y € C with 100% certainty, so P(C) = 1. 
We also showed that each cm was equally likely to be measured, so this is equivalent 


to randomly selecting a c < a. Finally, we saw that randomly selecting a c < a would 
also produce one that was coprime to a 50% of the time. So, 


P(B) = P(BIC) PC) = (5) ie = a QED 
We proved, in fact, P(B) = 1/2, exactly. Of course this means that 
1 
P(acoa) = 1—P(coa) = ee 


665 


23.9.10 Step V: Observe that y, Associated with c Coprime 
to a Will be Measured in Constant Time 


Result from the CTC Theorem for Looping Algorithms 


Step V follows immediately from a past result. 


We have a circuit that we plan on using in an algorithm. Furthermore, our al- 
gorithm is going to loop, querying the circuit each pass of the loop, obtaining a 
statistically independent result each time. It produces a number c, each pass, and 
for the sake of terminology we’ll consider co a to be a success (and a c which is not 
coprime to a to be a failure). In the last subsection we proved that P(success)= 1/2, 
independent of the size, N. 

These are the exact conditions of the CTC theorem for looping algorithms that we 
covered at the end of our probability lesson. That theorem tells us that the algorithm 


will “succeed,” i.e., yield a céa, in constant time, 7, and the discussion that followed 
told us what T would be, 


r= loan th 


where ¢ is our desired (small) error tolerance and || is notation for the floor of x, 
or the greatest integer < 2. 


Result from First Principles 


We can reproduce the effect of the CTC theorem without invoking it, and for practice 
with probability theory let’s go ahead and do it that way, too. 


If we run the circuit T times, obtaining 7’ samples, 
Mc, MCy, ..., MCr, 


the probability that none of the associated c were coprime to a is 


P( (=a) x (<2) Nn a2 l\ (ser $a) ) 
1\t 
= (5) , constant time, 7’, independent of N. 


This means that the probability of at least one mc having a ce a is: 


1 


© 
P(at | = 1-|[-]. 
(at least one co a) (5) 


Conclusion 


After an adequate number of measurements (independent of a, W/), which produce 
Yer: Yoo: +++ Yer; We can expect at least one of the y,, = cm to correspond to c, 


666 


with c a — that is, y., € B -— with high probability. In a moment, we’ll see why it’s 
so important to achieve this coprime condition. 

Example: If we measure the output of our quantum circuit repeatedly, insisting 
On Ye ce a with P > .999999, or error tolerance ¢ = 10~°, we would need 


log (10~°) ~6 
T= |! 4+1 = |-aql] +1 = 19941 = 20 
log(5) | ~ — 30103 | * " 


measurements. 
We can instruct our algorithm to cycle 100 times, to get an even better confidence, 
but if the data determined that only eight measurements were needed to find the 


desired cm, then it would return with a successful cm after eight passes. Our hyper- 
conservative estimate costs us nothing. 


23.9.11 Algorithm and Complexity Analysis (Easy Case) 
A Classical Tool: The Euclidean Algorithm 


The Euclidean Algorithm (EA) is a famous technique from number theory takes two 
input integers P,Q with P > Q in Zso and produces the greatest common divisor of 
P and Q, also represented by gcd(P, Q). This is the largest integer that divides both 
P and Q, evenly (without remainder). EA(P, Q) will produce ged(P, Q) in O(log? P) 
time. We will be applying EA to N and cm, to get m’ = gcd(N,cm) so the algorithm 
will produce m’ with time complexity O(log? N) = O(log? M). 

We'll assume that EA(NV, cm) delivers as promised and go into its inner workings 
in a later chapter. EA is a crucial classical step in Shor’s algorithm as we see next. 


Why Finding y. € B Solves Shor’s Problem 


We proved that we will measure y, € B with near certainty after some predetermined 
number, 7’ of measurements. Why does such a y, solve Shor’s period-finding problem 
(in the easy case)? 


Here is our circuit, after measuring cm: 


|0)” ——~+ Hen OFT”) A Ny lem) 
Of 
0)" OQ 


After the measurement, this is what we know and don’t know: 


Known Unknown 


N Cc 
cm m 
a 


667 


So the first thing we do is use EA to produce m’ = gcd(.V, cm). Now, if cm € B then 
co a, and 


cba => m' = gcd(N,cm) = gced(am,cm) = m 


That gives us the unknown m and from that we can get a: 


But this only happens if we measure y € 6, which is why finding such a y will give 
us our period and therefore solve Shor’s problem. 


How do we know whether we measured an cm € B? We don’t, but we try m’, 
anyway. We manufacture an a’ using m’, by 


hoping a’ = a, which only happens when m’ = m (and, after all, finding a, not m, is 
our goal). We test this by asking whether f(x +a) = f(x) for any x and x = 1 will 
do. a is the only number that will produce an equality here by our requirement of 
injective periodicity. If it does, we’re done. If not, we try J’— 1 more times because, 
with .999999 probability, we’ll succeed after T’ = 20 times — no matter how big M 
(or N) is. 

As for the cost of the test f(z +a) = f(x), that depends on f. We are only 


asserting relativized polynomial complexity for period-finding but know (by a future 
chapter) that for factoring, f will be polynomial fast (O(log*(/)), and maybe better). 


We can now present the algorithm. 


Shor-like Algorithm (Easy Case) 


e Select an integer, 7, that reflects an acceptable failure rate based on any known 
aspects of the period. E.g., for a failure tolerance of .000001, We might choose 
== 9) 


e Repeat the following loop at most T’ times. 


. Apply Shor’s circuit. 

. Measure output of OFT and get cm. 

. Compute m’ = EA(N,cm), and set a’ = N/m’. 

. Test a’: If f(1+ a’) = f(1) then a’ = a, (success) break from loop. 


oo F WO YO FR 


. Otherwise continue to the next pass of the loop. 


e If the above loop ended naturally (i.e., not from the break) after T full passes, 
we failed. Otherwise, we have found a. 


668 


Computational Complexity (Easy Case) 


We have already seen that O(log(V)) = O(log(M)) as well as any powers of the logs, 
so I will use M in the following. 


e The Hadamard gates are O(log(M)). 
e The OFT is O(log”(M)). 


e Complexity of Uy is same as that of f, which could be anything. We don’t count 
that here, but we will show in a separate lecture that for integer factoring, it is 
certainly at least O(log* M). 


e The outer loop is O(T) = O(1), since T is constant. 
e The classical EA sub-algorithm is O(log’ M). 


e The four non-oracle components, above, are done in series, not in nested loops, 
so the overall relativized complexity will be the worst case among them, O(log®(M)). 


e In the case of factoring needed for RSA encryption breaking (order-finding) the 
actual oracle is O(log*(M)) or better, so the absolute complexity in that case 
will be O(log*(M)) 


So the entire Shor circuit for an f € O(log*(M)) (true for RSA/order-finding) would 
have an absolute complexity of O(log*(M)). Notice that, while not an exponential 
speed-up over the O(log®(M)) easy case classical algorithm presented, it is “faster” by 
a factor of log N. That’s due to the fact that the quantum circuit requires a constant 
time sampling, while the classical function must be sampled O(log NV) times. To be 
fair, though, the classical algorithm was deterministic, and the quantum algorithm, 
probabilistic. You can debate whether or not this counts in your forums. 


This completes the easy case fork-in-the-road. It contains all the way points that 
we will need for the general case without the subtle and tricky math. You are now 
well positioned to understand those additional details, so ... onward. 


23.10 Second Fork: General Case (We do not As- 
sume a|N) 


Now, a need not be a power-of-2, so the classical approach no longer admits an 
O(logN) algorithm. Also, if we take the integer quotient, m = |N/a|, there will 


usually be a remainder representing those “excess” integers that don’t fit into a final, 
full period interval. 


669 


“excess” 


_-———+-__——_+—_—_+__ ------- +—__+— 
0 a 2a 3a (m—1)a ma N 
N = 2" 


Figure 23.15: There is (possibly) a remainder for N/a, called the “excess” 


23.10.1 Partitioning the Domain into Cosets 


Like the easy case, f’s injective periodicity helps us rewrite the output of the oracle’s 
B register prior to the conceptual measurement. The domain can still be partitioned 
into many disjoint cosets of size a, each of which provides a 1-to-1 sub-domain for f, 
but now there is a final coset which may not be complete: 


cosets _ 
« = K = 
R R+a R+¥72a R+(m-—l)a 
a a a f 
eee 
0 a 2a 3a (m—1)a ma N 
H ! 
“partial” 
<R+ma> 


Figure 23.16: [0, NV) is the union of distinct cosets of size a, except for last 


Express [0, N — 1] as the union of a union, 


[0, N—1] 
excess 
i 
= |0,a-—1] U [a, 2a-—1] --- U [(m—1l1)a, ma—1] U [ma, N-1] 
partial 
——a 
= R U R+a +--+ U R+(m-I1)a, U (R+ ma) 


670 


where 


R= (i ¢=1) = 40, 1,2 ..,@—1} 
a = period of f, and 
m = |N/a|, the number of times a divides (unevenly) into N. 


Cosets. As before, R + ja is called the jth coset of R, but now we have 
(R + ma) C [R+mal, 


the partial mth coset of R from ma to N — 1. 


R+(m-—l1)a 
J: 


“partial” 
<R+ma> 


Figure 23.17: The final coset may have size < a 


We express the decomposition of our entire domain relative to a typical element x in 
the base coset R, 


a a-1 N-—ma-1 
[0, N-1] = Ut x+ja } U { x+ma } ‘ 
x=0 «=0 
j=0 
but now we had to “slap on” the partial coset, {2+ ma }, to account for the possible 
overflow. 


Notation to Deal with the “Partial Coset” 


We have to be careful about counting the family members of each element x € R, i.e., 
those z+ ja who map to the same f(x) by periodicity. We sometimes have a member 
in the last, partial, coset, and sometimes not. If x is among the first few integers of 
R, i.e., € [0, N —ma), then there will be m+1 partners (including x) among its kin. 


However, if x is among the latter integers of R, i.e., € [N —ma,a), then there will be 
only m partners (including x) among its kin. 


We'll use m to be either m or m + 1 depending on z, 


mi = 


oS " +1, for the “first few” x in [0,a— 1] 


m, for the “remaining” z in [0,a — 1] 


671 


x ++ (m— i)a x+ma 
' I 
0 N-—-ma a (m—l)a ma N 
Figure 23.18: If 0 << « < N—ma, a full m+1 numbers in Zy map to f(x) 


x x+Gr—Da 


0 N—ma a (m—1)a ma N 


Figure 23.19: If N — ma < x <a, only m numbers in Zy map to f(z) 


23.10.2 Rewriting the Output of the Oracle’s output 


The original expression for oracle’s B register output was 


(=) 3 2)” [4(@))" 


and our new partition of the domain gives us a nice way to rewrite this. First, note 
that 


x 
xcta 
x+2a 


hae 8 


x+(m-—l)a 
( and sometimes ... x+ma ) een 


Now we make use of our flexible notation, m, to keep the expressions neat without 
sacrificing precision of logic: 


se |e" If@)y" = (- *) > = lz + ja)" } |f(2))", 


The factor of 1//m inside the sum normalizes each term in the outer sum. However, 
the common amplitude remaining on the outside is harder to symbolize in a formula, 
which is why I used “®” to describe it. (m doesn’t even make good sense outside the 
sum, but it gives us an idea of what the normalization factor is.) It turns out that we 


don’t care about its exact value. It will be some number between , ix and yf et 


N? 
the precise value being whatever is needed to normalize the overall state. 


23.10.3 Implication of a Hypothetical Measurement of the B 
register Output 


We’ve seen that it helps to imagine a measurement of the oracle’s B register’s output. 


OY eee OF A 
Us 
[o)" A 
ee 
Conceptual 


Each B register measurement of f(x) will be attached to not one, but m, input A 
register states. The generalized Born rule tells us that measuring B will cause the col- 
lapse of A into a superposition of m CBS states, narrowing things down considerably. 


TD (ved tse") ley’ EX 
1 m—-1 nt ; 
\ (Fs d |Zo + ja) ) f(o)) 


(Here, \\Y means collapses to.) 


If after measuring the post-oracle B register we were to go on to measure the A 
register, it would collapse, giving us a reading of one of the m values, 79 + ja, but 
that value would not get us any closer to knowing a, so as with the easy case, we don’t 
measure A yet. Instead, we name the collapsed — but unmeasured — superposition 


state in the A register |w,,)”, since it is determined by the measurement “f(x)” of 
the collapsed B register, 


[Yeo)” 


I 

alm 

1b 
a 
+ 
S 


Foregoing the B Register Measurement. By now we have twice seen why 
we can avoid this conceptual measurement yet still analyze the A register as if it had 


673 


occurred. There is no harm in pretending we measured and collapsed to a |f(o))" 
first. (See the Easy Case or our lesson Simon’s Algorithm for longer explanations of 
the same argument.) 


Motivation for Next Step 


The conceptual measurement of the B register leaves an overall state in the A register 
in which all the amplitudes are zero except for m of them which have amplitude = 
Furthermore, those non-zero terms are spaced at intervals of a in the N-dimensional 
vector: this is a “pure” periodic vector with the same period a as our function f. 


In contrast to the easy case, however, m is sometimes m and sometimes m + 1, 
never mind that a is not a perfect power-of-2 or that it doesn’t divide into N evenly. 
All this imperfection destroys the arguments made in the easy case. 


A look at the DFT of the vector whose components match the amplitudes of 
a typical collapsed state left in the A register (see Figure 23.20) confirms that a 
frequency domain measurement of OFT? |wx9) no longer assures us of seeing one 
of the a numbers cm’, c = 0, ... a—1, with m’ the true — and typically non-integer — 
frequency N/a. As for our integer quotient m = | .N/a| close to the actual frequency 
N/a, it’s possible that none of the frequency domain points cm have high amplitudes. 


0.00 
0 20 40 6&0 80 100 120 

Figure 23.20: The spectrum of a purely periodic vector with period 10 and frequency 

12.8 = 128/10 


The situation appears grim. 


Let’s look at the bright side, though. This picture of a typical DFT applied to an 
N dimensional vector, 0 except for amplitudes Ts at m time domain points, suggests 
that there are still only a frequencies, yo, yi, ---; Ya-1 Which have large magnitudes. 
And there’s even more reason for optimism. Those likely y, appear to be at least 
close to multiples of the “integer-ized frequency” m, i.e., they are near frequency 
domain points of the form cm. (Due to an artifact of the graphing software the 0 
frequency appears after the array at phantom position N = 128.) 


674 


The Big Picture 


It still seems to be a good idea to apply a OFT to the A register in order to produce 
a state that looks like Figure 23.20. 


The Key Steps 


We’ll show that there are only a likely A register measurements, y., c = 0, 1,..., (a— 
1). And even if a given y, # cm exactly, it will at least lead us to a cm. In fact, we 
can expect the resulting c to be relatively prime to a with good probability, a very 
desirable outcome. Here are the general steps. 
e We will identify a special set of a elements, C = {yc} ag of high measurement 
likelihood. 


e We will show that each y, is very close to — in fact, uniquely selects — a point of 
the form cm. We'll describe a fast (polynomial time) algorithm that takes any 
of the likely measured y, to its unique partner cm. 


e We'll prove that we can expect to get a y. (and thus cm) with c coprime to 
the period a in constant time. Such a c will unlock the near-frequency, m, and 
therefore the period, a. 


23.10.4 Effect of a Final OF 7 on the A Register 


The QFT is applied to the conceptually semi-collapsed |q),,)” at the output of the 
oracle’s A register: 


(0 ——.e" OFT A 
Us 
jo)" A 


The linear OFT passes through the ©, 


m—1 m—1 
r=) |o-+ ja)" wage 
— Y5 to +50)” —forr™};— 0 OFT™ Ia9 + ja) 
vin j=0 vin j=0 


Note that it doesn’t matter that m is sometimes m and sometimes m+ 1. Once 
it collapses, it becomes one of the two depending on which zg is selected by the 
measurement, and the formulas all still hold true. The QF7T of each individual term 


675 


is 


: Ly ee 
OFT) |xy+ ja)” = Te wenn ly) 


I N-1 
a Yeo Y yyIay yy”, 
UN d. I) 
y= 


so the OFT of the entire collapsed superposition is 


m—-1 


1 
mM 
m—1 


eo = N-1 
= yt Y yy 
y=0 


j=0 


1 N-1 -1 
Woy yyy y 
= SS ly)” 


y=0 j=0 
N-1 m-1 


| 
Si 
2 
€ 
3 
< 
& 
5 
= 
3 


In this expression, the normalizing factor TEN is precise. That’s in contrast to the 
pre-collapsed state in which we had an approximate factor outside the full sum. The B 
register measurement “picked out” one specific x9, which had a definite m associated 
with it. Whether it was m or m+ 1 doesn’t matter. It is one of the two, and that 
value is used throughout this expression. 


Summary. This is the preferred organization of our superposition state prior 
to sampling the final A register output: 


N-1 m—-1 
1 coy JaY 
= Ww Ww ly) 
mN 
y=0 j=0 


The next several sections explore what the probabilities say we will see when we 
measure this state. And while we analyzed it under the assumption of a prior B 
measurement, the upper (A) channel measurement won’t care about that conceptual 
measurement, as we'll see. We continue the analysis as if we had measured the B 
channel. 


676 


23.10.5 Computation of Final Measurement Probabilities (Gen- 
eral Case) 


This general case, which I scared you into thinking would be a mathematical horror 
story, has been a relative cakewalk so far. About all we had to do was replace the 
firm m with the slippery m, and everything went through without incident. That’s 
about to change. 

In the easy case, we were able to make the majority terms in our sum vanish (all 


but l-in-m). Let’s review how we did that. We noted that w = wy, the primitive 
Nth root, so 


ig 


Then we replaced N with ma, to get 


eal | 


and realized that this implied that w® was a primitive mth root of unity. From there 
we were able get massive cancellation due to the facts we developed about sums of 
roots-of-unity. 

The problem, now, is that we cannot replace N with ma. We have an m, but 
even resolving that to m or m+ 1 won’t work, because neither one divides N evenly 
(by the general case hypothesis). So we’ll never be able to manufacture an mth root 
of unity at this point and cannot watch those big sums dissolve before our eyes. So 
sad. 


We can still get what we need, though, and have fun with math, so let’s rise to 
the challenge. 


As with the easy case our job is to analyze the final (post QF7) A register 
superposition. While none of the terms will politely disappear the way they did in 
the easy case, we will find that certain y states will be much more likely than others, 
and this will be our savior. 


Computing the final measurement probabilities will require the following five steps. 
1. identify (without proof) a special set of a elements, C = {yoo of high mea- 
surement likelihood, 


2. prove that the values in, C = {yc} erg have high measurement likelihood, 
3. associate {y.}%_9 with {c/a}, 


4. describe an O(log?N) algorithm that will produce c/a from y,, and 


5. observe that y, associated with c coprime to a will be measured in constant 
time. 


677 


23.10.6 STEP I: Identify (Without Proof) a Special Set of 
a Elements, C = {yc} ea of High Measurement Likeli- 
hood 


In this step, we will merely describe the subset of y that we want to measure. In the 
next step, we’ll provide the proof. 


In the easy case we measured y, which had the special form 
PSon,.. "oy Tee weg eds 


with 100% certainly in single measurement. From there we tested whether the c was 
coprime to a, (which it was with high probability), and so on. This time we can’t be 
100% sure of anything even after post-processing with the QFT, but that’s normal 
for quantum algorithms — we often have to “work the numbers” and be satisfied to 
get what we want in constant or polynmoial time. I claim that in the general case we 
will measure an equally small subset of y, again a in all, that we label 

aU, “Ct. A eg 


with probability P = 1 — ¢ in O(1) measurements. (Whenever you see me use a €, 
it represents a positive number which is as small as we want, say .000000001, for 
example.) The following construction will lead us to these special a values. 


Consider the (very) long line [0, aN — 1]. 


0 N 2N 3N (a—1)N aN 
Figure 23.21: A very long line consisting of a copies of N = 2” 


Around the a integral points on this line, 
Oe NON ees: Ne as (aS TN, 


place (relatively small) half-open intervals of width a. 


|-5 +5), |v -< N+<), 


2 9 2 D 
a a 
N= oN ~ ) 
[e Ne 
[(a-vn -$, (a-)N +5) 


Each interval contains exactly one integral multiple, ya, of a in it. We'll label the 
multiplier that gets us into the cth interval y,. (yo is easily seen to be 0.) 
a 


| a 
—.~, += ee z N—-, cN =) 
0a € | = +5), ; yea é |e 5 Cc +5 


, yo1a € [(a—1)N-5, (a-1)N +5) 


678 


(c—1)N cN (c+1)N 


Ye-1 4 Me a Ye+1 @ 
y y y 
— +o} +9} te 
cN 


Figure 23.23: Exactly one integral multiple of a falls in each interval 


Since y.a < aN, for all c= 0,1,--- ,(a—1), we are assured y, < N, so it is a possible 
result of our measurement after the OFT. 


Claim. The {y,} just described are the a values most likely to be measured. 
We'll see that with probability P = 1 —e, arbitrarily close to 1, we will measure one 
of these y, in constant time, 7’, independent of N. (The proof appears in the next 
step. In this step we only want to establish our vocabulary and goals.) 


Example 1 


In this first of two examples we illustrate the selection of y,. for an actual periodic 
function defined on integers [0, N — 1], where N = 2° = 32, and the period a = 3. 
(Remember, it doesn’t matter what the values f(k) are, so long as we’re told it is 
periodic. ) 


e c=0: The center of the interval is ON = 0-32 = 0. We seek yo such that 


3yo € [-1.5, 1.5) 
=> yo = JO, since 
3-0 = 0 € [-1.5, 1.5). 


e c=1: The center of the interval is 1N = 1-32 = 32. We seek y; such that 


3y1 € [80.5, 33.5) 
= wy = LL, since 
3-1 = 33 € [30.5, 33.5). 


679 


e c=2: The center of the interval is 2N = 2-32 = 64. We seek yo such that 


3y2 € [62.5, 65.5) 
= yee > 21, since 
3-21 = 63 € [62.5, 65.5). 
e Etc. 
Example 2 


In this second example, we test the claim that the y, so described really do have high 
relative measurement likelihood for an actual periodic function. 


Consider a function that has period 10 defined on a domain of size 128 = 2°. Our 
problem variables for this function become 


Nese e 
N Qo” = 128, 
a = 10 and 


a] [== 


Let’s say that we measured the B register and got the value f (ao), corresponding to 
Xo = 3. For this z, 


m 


since 
ti +ma = 34+ 12-10 = 123 < 128. 


According to the above analysis, the full pretested superposition, 


127 


Tg HY” een” 


7 sometimes 
i 7 7 7 ? r 
= x)’ + |c+10)7 + |e+20)7 + --- + Jax +120 ) x 
el ey a je-+120)") If(a)) 
when subjected to a B register measurement will collapse to 
1 
6 r 7 7 7 7 r 
a)" = (3) + [18)" + fa)? +--+ [123)") 1F@3))", 
lw)’ |F(3)) TG 3)" + [18)" + [28) |123)" } |F(8)) 


leaving the following vector in the A register (which I have “stacked” so as to align 
the 13 non-zero coordinates): 


680 


.27735, 0 
.27735, 0 
.27735, 0 
.27735, 0 
.27735, 0 
.27735, 0, 
.27735, 0, 
ry) 
0 
0 
ry 
0 
0 


.27735, 
.27735, 
.27735, 
.27735, 
.27735, 
.27735, 


ooooocooooo0o0co0c°0o 
ooooo0o 00 00000 
ooooocoooo0o0o 000 
. ooooo oo 00 000 

ooooooooocooco0c°o 


oooooooo0o0o 000 
v 


ooooooooco0o0o0 0 


We are interested in learning f’s period and, like the easy case, the way to get at it is 
by looking at this state vector’s spectrum, so we take the OFT. Now, unlike the easy 
case, this vector’s OFT will not create lots of 0 amplitudes; generally all N = 128 of 
them will be non-zero. That’s because the resulting sum 


OFT) I)? = Sum (ru) ly)”. 


did not admit any cancellations or simplification. Instead, the above claim — which 
we will prove in the next section — is that for x9 = 3 only m = 13 of them will be 
likely, the special {y.}, for c= 0, 1, 2, ..., 12 we described in our last analysis. Let’s 
put our money where our mouth is and at least show this to be true for the one 
function under consideration. 


We take three y, as examples: yy, ys and ye. We'll do this in two stages. First 
we identify the three y, values. Next, we graph the probabilities of QFT®) |)" 
around those three values to see how they compare with nearby y values. 


Stage 1. Compute y4, ys and yg 
e c=4: The center of the interval is 4N = 4-128 = 512. We seek yq such that 
10y, € [507, 517) 
= 1d Sod, since 
10-51 = 510 € [507, 517). 
e c=5: The center of the interval is 5N = 5-128 = 640. We seek ys such that 
10y; € [635, 645) 
> ys = 64, since 
10-64 = 640 € [635, 645). 
e c=6: The center of the interval is 6N = 6-128 = 768. We seek ye such that 


10ys € |763, 773) 
— es | on oe since 
LORE = FR =, | FO3e "FP E3. 


681 


2 
Stage 2. Look at the Graph of oro) | Around These Three y 


Here is a portion of the graph of the QF7T’s absolute value squared showing 
the probabilities of measuring over 35 different y value in the frequency domain. It 
exhibits in a dramatic way how much more likely the y, are to be detected than 
are the non-y, values. (See Figure 23.24.) Even though the non-y, have non-zero 


0.00 
45 30 55 60 65 70 75 80 


Figure 23.24: Probabilities of measuring yy = 51, ys; = 64 or yg = 77 are dominant. 


probabilities of measurement, they barely register except for a few. 


If we apply the math to all 10 y, for this xo (which, remember is 3), we find that 
the measurements in the frequency domain will usually yield values that lie in the 
special set 


C = {0, 13, 26, 38, 51, 64, 77, 90, 102, 115 } 


[Exercise. Show this for the y. not computed in our example.| 


Measuring along the frequency basis means applying the basis-transforming OFT 
after the oracle and explains its presence in the circuit. 


Introducing the y 


To assist our analysis we’ll define a set of % associated with the desired y, such that 
the % all live in the interval [—$, +%) (Figure 23.25). 

We do this by subtracting cN from each y.a to “bring it” into the base interval 
[-$, +5 e More precisely, 


Yo = ya-cN € |-3; +5), c=0, 1, 2, ...,a—1, 
2 2 
which can be stated using modular arithmetic by 


Ye = yea (mod N). 


While we will be looking to measure the y,, it will be important to remember that 


they correspond (= mod JN ) to these y, € | -$, +5 i 


682 


Y2 Yo Y5 
-—— {eo —_______4-—2e—_} — — — 
—a/2 A 0 ' +a/2 
Joos I 


Figure 23.25: y all fall in the interval |—a/2, a/2) 


This is the vocabulary we will need. Next, we’ll take a lengthy — but leisurely — 
mathematical cruise to prove our claim of high probability measurement for the set 


C. 


23.10.7 STEP II: Prove that the Values in, C = {yc} rag Have 
High Measurement Likelihood 


We gave a convincing argument that the a distinct frequency domain values {y.} 
constructed in the last section do produce highly likely measurement results, but we 
didn’t even try to prove it. It’s time to do that now. This is the messiest math of the 
lecture. [’ll try to make it clear by offering pictures and gradual steps. 


When we last checked, the (conceptual) measurement/collapse of B register to 
state | f(xo)) left the post-OF7T A register in the state 


N-1 m—1 
1 
roy Jay n 
= S Ww 5 Ww ) 
mN 
y=0 j=0 


1 
P (measurement yields y) = =~ |w?¥|? ) ie 
mN 
j=0 
ml : 
1 ; 
=. = iad 
an |. 
j=0 
Letting 
heau™, 


683 


the summation factor on the right (prior to magnitude-squaring) is 


m—-1 
Sow = lth t wnt 
j=0 

nies! 

- (— 


Having served its purpose, js can now be jettisoned, and 


200 


pr —1 wom — 1 enam _ J] 
2Qrt 
pl wey — J enw —] 
eifym mo 
= Fel 


where we are defining the “angle” 
27ay 
0, = : 
N 


In other words, the probability of measuring any y is 


1 
mN 


P (measurement yields y) = 4 
ey — 


The next several screens are filled with the math to estimate the magnitude of the 
fraction, 


eifym ta | 
ei9y — 1 ‘ 


We will do it by bounding numerator and denominator, separately. It’s a protracted 
side-trip because I go through each step slowly, so don’t be intimidated by the length 
of the derivation. 


A General Bound for je’? = 1| 


It will help to bracket the expression: 


? < |e*-1| < ? 


An Upper Bound for |e’? — 1| 


In the complex plane, |e’* — 1] is the length of the chord from 1 to e’, and this is 
always < the arc length from 1, counterclockwise, to e’®, all on the unit circle. But 
the arc length is, by definition, the (absolute value of the) angle, ¢, itself, so: 


je -1]  < |df. 


684 


Figure 23.26: The chord is shorter than the arc length 


A Lower Bound for |e? = 1| 


By the Euler identity 


1 


— e? = 1 — (cos¢d + ising). 


By the addition laws for sine and cosine applied to ¢ = g + g 


So 


1 — e? 


Noting that 


we have shown that 


cos @ 


(3) (9) 
(5)s(3) 


| 
i) 
io) 
(e) 
Dn 


uno = 


685 


Figure 23.27: |sin(x/2)| lies above |x/z| in the interval (—7, 7 ) 


By simple calculus and solving ¢/a = sin (¢/2) for @, or noting where the graph of the 
sine curve and the line intersect, we conclude that when ¢ is in the interval |—7, z], 
we can bound 2 |sin (¢/2)| from below, 


zal 


T 


sin (5) | , for d € [—7,7]. 


= 2 


This gives us a lower bound for le’? _ 1|: 


2 |e 


< le"? — 1], for ¢ € [—7, 7]. 
TT 


Combining both bounds when ¢ is in the interval [—7, 7], we have bracketed the 
expression under study, 


2lel < |e*-1] < |dl, forge [-a,a]. 


Applying Both Bounds to the Fraction 


Our goal was to estimate 


eifym | 
ety — 1}? 


especially when y = y,.We prefer to work with the 7 which are in an a-sized interval 
around 0: 


Yo = yoa—cN € |-= +5), c=0, 1, 2, ...,@-1 


686 


So, let’s first convert the exponentials in the numerator and denominator using the 
above relation between those special {y.} (which we hope to measure with high prob- 
ability) and their corresponding mod N equivalents, the {y.}, close to the origin. 


cifve = ei (2maye/N) cir [BeteN]/N) _ gi(2nGe/N) ire 


ci 2nGe/N) pi 8g /a 


This allows us to rewrite the absolute value of the amplitudes we wish to estimate as 


eifym 2257 
ey — 1 


e%Gj.m/a = 4 


eg. /a —_ 1 


We want a lower bound for this magnitude. This way we can see that the likelihood 
of measuring these relatively few ys is high. To that end, we get 


e a lower bound for the numerator, and 


e an upper bound for the denominator. 


Upper Bound for Denominator 


The denominator is easy. We derived an upper bound for all angles, @, so: 


- a 
Lower Bound for Numerator 
The numerator is 


Remember that m is sometimes m and sometimes m + 1. 


Sub-Case 1: m is m 


When m is m, we have things easy: 2mmy. / N € (—7,7) because 

-- <h < 5, 

2 2 
or 
am 27m 
MoS qn aes 
And, since 
~ <1, 


we get 


2 2am. 2 
—T —— Y. TT. 
N Y 
This allows us to invoke the lower bound we derived when ¢ € [7, 7], namely 
ereertey | 2 |2rmy. / N| 
~ T 
Re-applying the notation 
27ay 
i, = ; 
: N 
we write it as 
2 |™ 6- 
Je2nmue/N _ | |= Yc 
T 


Combining the bounds for the numerator and denominator (in the case where m = m), 
we end up with 
i05,.m/a _ m Q~ 
e€ 2 1 = 5 a Ve 
eGo! — | = T 


Sub-Case 2: m is m+ 1 (Deferred) 


That only worked for m = m; the bounds argument we presented won’t hold up when 
m = m-+1, but we'll deal with that kink ina moment. Let’s pretend the above bound 
works for all m, both m and m +1, and finish computing the probabilities under this 
white lie. Then we’ll come back and repair the argument to also include m = m+ 1. 


Simulating the Probabilities Using the m = m Bounds 


If the above bounds worked for all m, we could show that this leads to a desired 
lower bound for our probabilities of getting a desired y € C in constant time using 
the following argument. 

For any y, we said 


iOym __ 1 2 


ii 
mN 


e 
e9y =] 


P (measurement yields y) = 


y) 


but for y = yc, one of the (a) special ys that lie in the neighborhoods of the cNs, 
we can substitute our new-found lower bound for the magnitude of the fraction. 
(Remember, we are allowing that the bound holds for all m, even though we only 
proved it form =m), so 


dts Ama 


hs t yields y.) > zs > 
(measurement yields y.) > AN qo 


4 
mr 


ale 


688 


The last inequality holds because some of the m are m+ 1 and some are m. Now, 
there are a such y,, namely, Yo, Yi,°°* ;Ya—1, each describing a distinct, independent, 
measurement (we can’t get both y32 and ygi). Thus, the probability of getting any one 
of them is the sum of the individual probabilities. Since those individual probabilities 
are all bounded below by the same constant, we can multiply it by a to get the 
collective lower bound, 


am 


4 
P t yield Lthe yj <a 
(measurement yields one of the y,) > VN 7 


[Exercise. Make this last statement precise.| 


In this hard case, allowing = a|N, we defined m to be the unique integer satisfying 
ma < N < (m+1)a, or quoting only the latter inequality, 


(m+lja > N. 


It’s now time to harvest that “weak additional assumption” requiring at least two 
periods of a to fit into M, 


. M 
a 5? 
which also implies 
J N 
a 5" 
We combine those to conclude 
am zs, 1 
N 2 


[Exercise. Do it.] 


We can plug that result into our probability estimate to get 


2 
P (measurement yields one of the y.) > —=.- 
1 


Note-to-file. If a<< M < N, as is often the case, then 


am 


1, 
N 
and the above lower bound improves to 
P (measurement yields one of the y.) > —s-e, 


2 


where ¢-—7~0asm—-o 


[Exercise. Compute the lower bound when m = 100.] 


689 


Our last lower bound for the “p of success,” 2/7? > 0, was independent of M (or 
N or a), so by repeating the random measurement a fixed number, 7’, times, we can 
assure we measure one of those y, with any desired level of confidence. This follows 
from the CTC theorem for looping algorithms (end of probability lesson), but for a 
derivation that does not rely on any theorems, compute directly, 


P (we do not get a y. after T measurements) 


p 
= P ( \ 4 measurement 7; yields a y, 


k=1 


T 
= I] Pt 4 measurement 7; yields a yy. ) 


< I-32) 


The last product can be made arbitrarily small by making T large enough, indepen- 
dent of N, M, a, etc. This would prove the claim of Step 2 if we could only use the 
bound we got form =m. But we can’t. So we must soldier on ... 


Repeating the Estimates when m = m+ 1 Bounds 


We now repair the convenient-but-incorrect assumption that m is always m. To do 
so, let’s repeat the estimates, but do so form = m+1. This is a little harder 
sub-case. When we’re done, we’ll combine both cases. 

Remember where m came from, and why it could be either m or m+ 1. In our 
general (hard) case, a does not divide N evenly; the first few x € R = [0,a — 1] will 
generate m+ 1 mod-a relatives within [0, NV — 1] that map to the same f(x), while 
the last several « € R = [0,a — 1] will only produce m such mod-a relatives within 
[0, N — 1]. m represented however many mod-a relatives of x fit into the [0, N — 1] 
interval: m for some x, and m+ 1 for others. 


We retrace our steps. 


The probability of measuring the state |y) is the amplitude’s magnitude squared, 


P (measurement yields y) = —=— 


| 
3 
2 
168 
e 


690 


So far, we’re okay; we had not yet made any assumption about the particular choice 
of m. To bound this probability, we went on to get an estimate for the fraction 


e%g-m/a =| | 


eG. /4 — 1 


The denominator’s upper bound worked for any @, so no change needed there. But 
the numerator’s lower bound has to be recomputed, this time, under the harsher 
assumption that m =m +1. 


Earlier, we showed that 27my, / N was confined to the interval (—7,7), which 
gave us our desired result. Now, however, we replace m with m+ 1, and we’ll see 
that 27(m + 1)y. / N won’t be restricted to (—7,7). What, exactly, are its limits? 
Start, as before, 


a a < & 
9 > Ye 2” 
but we need to multiply by a different constant this time, 
__a(m +1) < uae b um t 1) 
N N N 


By our working assumption, a/N < 2 (which continues to reap benefits) we can assert 
that 


a(m + 1) am a 1 
OREM CIE ges. Gee a 1 eat 
N Ne ae Se a 
giving us the new absolute bounds 
Z 2a(m+ 1). 2 3 
a Ne ee pee 


The next step would have been to apply the general result we found for any ¢ € 
[—a, 7], namely that 


210) ¢ 
T 


sin (5) | , for @€ [-7,7]. 

But now our “dé” = 27y.(m + 1)/N lives in the enlarged interval [—(3/2)z, (3/2)z], 
so that general result no longer applies. Sigh. We have to go back and get new 
general result for this larger interval. It turns out that old bound merely needs to be 


multiplied by a constant: 
sin g 
2 


Ks 4 A714. 


29 


2 
ie Ol, ee 8 
T 


3.3 
5 for oO) E 5s sm 5 


where K is the constant 


-6 -4 -2 2 4 $ 


Figure 23.28: |sin(x/2)| lies above |Ka/z| in the interval (—1.57, 1.57 ) 


Again, this can be done using calculus and solving K@¢/a = sin (¢/2) for kK. Visually, 
you we see where the graphs of the sine and the line intersect, which confirms that 
assertion. Summarizing the general result in the expanded interval, 


sin (5) 2k leh for d€ 3 sm : 


This gives us the actual bound we seek in the case m =m +1, namely 
i2n(m+1)Ge /N _ 1| S 2K |2n(m + 1)y. / N| 


le*-1] = 2 


le 


T 
Re-applying the notation 
27ay 
a Wo 
we get 
2K |™+ 9 
le’ 2n(m+1)ye/N _ 1| > | a Ye 
T 


Combining the bounds for the numerator and denominator (in the case where m = 


m+), 
10— m/a m+l1 9. 
em hes 2K |= a, 05. 
e/a — 1 _ vi a 
7 2K(m +1) _ 2Kkm 
7 T i T 


Finishing the Probability Estimates by Combining Both Cases m = m and 
m=m+1 


Remember that when m = m, we had the stronger bound: 
etacm/a — J 2m 2m 
ee a a 
eG Ja _ 1 


o] 


T T 


692 


so we can use the new, weaker, bound to cover both cases. For all m, both m and 
m+ 1, we have 


o) 


e'9g./4 — 1 


eaem/a _ 1 2km 
> 
TT 


for 


2sin an 
kK = 3 ~ A714. 


We can reproduce the final part of the m = m argument, incorporating K into the 
inequalities 


P( pstdaay & 1 4k? L m 4k? 
measurement yi Oi. ee fee: ee A, 

easurement yields y = N72 

The last inequality still acknowledges that some of the m are m+ 1 and some are m. 
Again, there are a such y,, namely, yo, ¥1,°** ; Ya—1, 80 the probability of getting any 
one of them is (>) a times this number, 


4k? 
P (measurement yields one of the y,) > Cun ane 
Nw? 
1 4K? 7 2K? she SRO 
QD 72 2 


[Exercise.] Compute K when m = 100. 


Note-to-file. If a<< M < N (i.e., m gets very large), as is often the case, two 
things happen. First, 
(m+1) = m 
N ? ae 
so we can use our first estimate, m =m, for measuring a y-, even in the general case. 
That estimate gave us 


2 
P (measurement yields one of the y.) > —.- 
1 


Second, our earlier “Note-to-file” suggested that, for m =m, 
: 4 
P (measurement yields one of the y.) > ete 
T 
where ¢€—>O0asm-— oo. 
Putting these two observations together, we conclude that when a << N, the lower 
bound for any measurement is ~ 4/7?. (“any” means it doesn’t matter whether 


the state to which the second register collapsed, |f(xo)), is associated with an xo for 
which there were m or m+ 1 mod-a equivalents in [0, N — 1)). 


693 


So we have both a hard bound assuming worst case scenarios (the period, a, cycles 
no more than twice in M) and the more likely scenario, a << M. Symbolically, 


P (measurement yields one of the y,) 


.04503, worst case: 3a > M 
> 
40528, typically 


In the worst case, the smaller lower bound doesn’t change a thing; it’s still a constant 
probability boundeed away from zero, independent of N, M, a, etc., and it still gives us 
a constant time, 7’, for detecting one of our special y, with arbitrarily high confidence. 
This, again, follows form the CTC Theorem for looping algorithms, or you can simply 
apply probability theory directly. 


In practice, a << M, so we can use .40528 as our constant “p of success” bounded 
away from 0. If we are satisfied with an ce = 10~° of failure, the CTC theorem tells 
us that the number of passes of our circuit would be 


log (10~®) ~6 
pes hi fl = 06@+1 = 27. 
Fe (59472) — 2256875 * 


If we sampled y 27 times our chances of not measuring at least one y, is less than of 
one in a million. 


This completes the proof of the fact that we will measure one of the a y, in constant 
time. Our next task is to demonstrate that by measuring relatively few of these y, we 
will be able to determine the period. This will be broken into small, bite-sized steps. 


23.10.8 STEP III: Associate {y.}“_) with {c/a}, 


We now know that we’ll measure one of those y, with very good probability if we 
sample enough times (O(1) complexity). But, what’s our real goal here? We’d like to 
get back to the results of the easy case where we found a number c that was coprime 
to a in constant time and used that to compute a. In the general case, however, ¢ 
is merely an index of the y., not a multiple cm, of m. What can we hope to know 
about the c which just indexes the set {y.}? You’d be surprised. 


In this step, we demonstrate that each of these special, likely-measured y, values is 
bound tightly to the fraction c/a in a special way: y./N will turn out to be extremely 
(and uniquely) close to c/a. This, in itself, should feel a little like magic: somehow 
the index of the likely-measured set of ys shows up in the numerator of a fraction 
that is close to y./N. Let’s pull back the curtain. 


Do you remember those (relatively small) half-open intervals of width a around 


694 


the points cN, 
ce © a a 
—, 5)? N= 5? N =) ’ 
. ) 2 7 2 


a a 
Se ns =) 
a 


[(a- wn - 5, (a-1)N +=) 


each of which contained exactly one integral multiple, y.a, of a in it? 
a a a a 
0 |-5. =), so SS c Jen - 5, N =) 
ae 5 = 5 Yea Ee | Cc 5 cl + 5 


Ya-1a € |(a-1)N- 5, (a-1)N +5) 


_) 


Let’s use the relationship of those y. with their host intervals. Start with the obvious, 


Yc-1 4 ve a Yo+1 & 
y v , 
—_— Ce 
cN 


cN — 5 < ya < cN—- ai 
then divide by N, 
Cc 1 < ve eZ (a f 1 
a 2N — N a 2N- 


In other words, y./N is within 1/(2N) of c/a, or symbolically 


1 
= IN- 


F Ye 
a N 


Remember, the relationship between M and N was defined by 


N 
— << MM? <= Ny, 
2 
so N > M?. Applying that to the above inequality involving y./N, we get, 
1 
= oe 


Cc Ve 


ic - & 


a N 


695 


9(n-1) gn 


0 a M N/2 M* N 
\ ] 
period problem bracket 
bound for M2 


Figure 23.30: N = 2” chosen so (N/2, N] bracket M? 


Compare that with the distances between consecutive c/a values (which I will prove 
in a moment), 


C a - 1 


a a M2’ 


and we can conclude that each of our special y., when divided by N, is closer to c/a 
than it is to any other fraction with denominator a. Thus, when we measure one of 
these y. (which we already showed we can do in constant time with arbitrarily good 
probability) we will be picking out the rational number c/a. In fact, we will show 
something stronger, but first, we fill in the gap and demonstrate why consecutive 
fractions c/a and (c+ 1)/a differ by, at least, 1/M?. 


Lemma. If two distinct integers satisfy 0 < p,q < M, then for any 
distinct rationals : and : , we are guaranteed that 


isd 


Use this with p=q< a, k<c andl «c+1, and you get the aforementioned 
lower bound for |c/a —(c+ 1)/al. 


Proof. Assume k/p > I/q. (Reverse the following if not). 


ki l = kq — Ip > kq — lp > one, QED 
pq pq M? M? 


This lemma tells us that c/a is not only the best fractional approximation to y, with 
denominator a, but it’s best among all fractions having denominators < M. For 
letting n/d be any fraction with denominator d < M. The lemma says 

| Cc n 1 

a d| ~~ M2? 
which places n/d squarely outside the 1/ (2M?) neighborhood that contains both c/a 
and y,/N. 


696 


Conclusion: Of all fractions n/d with denominator d < M, c/a is the only one 
that lies in the neighborhood of “radius” 1/ (2M?) around y,/N. Thus y./N strongly 
selects c/a and vice versa. 


[Interesting Observation. We showed that 
Ye : : Cc 
— is uniquely close to —. 
N a 


But multiply both sides by N to get to the equivalent conclusion, 


: ‘ N 
Ye iS uniquely close to ¢ (=) : 
What is N/a? It is freq, the exact frequency of our function f relative to the interval 
(0, N — 1]. (In general, it’s a real number between m and m+ 1.) In other words, 
each likely-measured y, is uniquely close to an integer multiple of the function’s exact 
frequency, 


Ye is uniquely close to c- freq. 


Our math doesn’t require that we recognize this fact, but it does provide a nice 
parallel with the easy case, in which our measured {y,} were exact multiples, {cm}, 
of the true integer frequency, freg = m.] 


23.10.9 STEP IV: Describe an O(log?N) Algorithm that Will 
Produce c/a from y, 


Highlights of Continued Fractions 


The continued fractions algorithm (CFA) will be developed in an optional lesson next 
week. In short, it takes a real number x as input and produces a sequence of (reduced) 
fractions, {n,/d,} that approach (get closer and closer to) x. We will be applying 
CFA to x = y./N, a rational number itself, but we still want these other fractional 
approximations because among them we’ll find one, n/d, which is identical to our 
sought-after c/a. 


Caution: There is no guarantee that the c 4 y, satisfies cha. So when CFA tells 


us that it has found the reduced fraction n/d = c/a, we will not be able to conclude 
that n = c and d=a. We will deal with that wrinkle in Step V. 


The plan is to list a few results about CFA, then use those facts to show that 
it will produce (return) the unique a-denominator fraction, c/a, closest to y./N, in 
O(log’? M) operations. (Reminder: M is the known upper-bound we were given for 
a). 

Here are the properties of CFA that we’ll need. Some of them are only true when 
x is rational, but for us, that’s the case. 


697 


1. During execution, CFA consists of a loop that produces a fraction in reduced 
form, nz/dz, at the end of the kth iteration. n,/d, is called the kth convergent 
for x. 


2. For any real number, x, the convergents approach x (thus justifying their name), 
that is 


(n stands for numerator here, as in numerator/denomintor, not the n in the 
exponent of Shor’s N = 2”.) 


3. For rational x, the above limit is finite, i.e., there will be a K < o, with 
nx/dx = x, exactly, and no more fractions are produced for k > K. 


4. In our version of CFA, we will supply a requested degree of accuracy ¢, and ask 
CFA to return n/d, the first fraction it generates which is within ¢ of x. 
Depending on ¢ and zx, CFA either returns n/d = nx /dx = x, exactly, as its 
final convergent, or returns an €-approximation n/d # x, but within « of it. 


5. The denominators {d;} are strictly increasing and, for rational x, all < the 
denominator of x (whether or not x was given to us in reduced form). 


6. If a fraction n/d differs from x by less than 1/(2d?) then n/d will appear in the 
list of convergents for x. Symbolically, if 


then 
K 
Nk 
= —® ¢€ a} (the convergents for 2). 
dy k=0 


7. When x = p/q is a rational number, CFA will complete in O(log? q). (Sharper 
bounds exist, but this is enough for our purposes, and is easy to explain.) 


Procedure for Using CFA to Produce c/a 


We apply CFA to x = y./N and « = 1/(2M?). 
Claim. CFA will produce and return the unique c/a within 1/(2M7?) of y,/N. 
Proof. a < M, so 


but in step III we showed 


| Cc Ye zZ 1 
a N 2M?’ 
sO 

Cc Ye 1 

eS =. eS. < — 

| a N 2a 


As this is the hypothesis of bullet 6, c/a must appear among the convergents of 
Yc/N. Since it is within 1/(2M7?) of y./N, we know that CFA will terminate when it 
reaches c/a, if not before. 

We now show that CFA cannot terminate before its loop produces c/a. If CFA 
returned a convergent n;/d, that preceded c/a, we would have 


by bullet 4. But since the d; are strictly increasing (bullet 5), and we are saying 
that the algorithm terminated before getting to c/a, then 


dy <a. 


That would give us a second fraction, nz/d, with denominator d;, < M within 
1/(2M?) of y./N, a title uniquely held by c/a (from Step II). Therefore, when we 
give CFA the inputs x = y./N and ¢ = 1/(2M7), it must produce and return c/a. 
QED 


CFA is O(log? M) 


By Bullet 7 CFA has time complexity O(log? N), which we have already established 
to be equivalent to O(log’ M). 


23.10.10 Step V: Measure y, Associated with c Coprime to 
a in Constant Time 


In Steps I and II we proved that the likelihood of measuring one of the special 
{ye}s2) was always bounded below by a constant, independent of N. (The constant 
may depend on how many periods, a, fit into M, but for RSA encryption-breaking, 
we will see that the special case we require will assure us at least two periods, and 


normally, many more). 


In the easy case, the {y.} were all equally likely, and we also knew that there were 
exactly a/2 coprimes < a. We combined those two facts to put the proof to bed. 
Here, neither condition holds, so we have to work harder (which is why this isn’t 
called the “easy case”). 


699 


What we can say is that, from Steps III and IV, each of the measured {y.}s 
leads — in constant time, with any predetermined confidence, ¢ — to a partner fraction 
in {c/a} with the help of some O(log® M) logic provided by CFA. 


In this step, we demonstrate that not only do we measure some y, (constant time) 
and get a partner, {c/a} (O(log? M)), but that we can even expect to get a special 
subset B C {y.} < {c/a} in constant time, namely those y. corresponding to ce a 
(i.e., c coprime to a). This will enable us to extract the period a from c/a. 


We do it all in three steps, the first two of which correspond to the missing 
conditions we enjoyed in the easy case: 


e Stated loosely, in our quantum circuit, the “difference” between measuring the 
least likely y. and the most likely y. is a fixed ratio independent of a. (This 
corresponds to the equi-probabilities of the easy case.) 


e The probability of selecting a number, c, which is coprime to a, at random, 
from the numbers between 0 and a — 1 is constant, independent of a. (This 
corresponds to the 50% likelihood of the easy case.) 


e We combine the first two bullets to show that the probability of getting a c ba in 
any single pass of the circuit is bounded away from 0 by a constant independent 
of the size of the algorithm. That’s the requirement of the CTC theorem for 
looping algorithms and therefore guarantees we obtain such a c with small error 
tolerance € in constant time, i.e., after a fixed number of loop passes independent 
of N. 


Proof of First Bullet 


This can be demonstrated by retooling an analysis analysis we already did. We 
established that the amplitude squared of a general |y) in our post-OFT A register 
was 


ym __ 1 2 


1 
mN 


e 
e29y =i) 


P (measurement yields y) = 


There are clearly two measurement-dependent parameters that will affect this proba- 
bility: m, which is either m or m+ 1, depending on the collapsed state of the second 
register, and @,, which depends on the measured y. When a << M the probabilities 
are very close to being uniform, but to avoid hand-waving, let’s go with our worst-case 
scenario — the assumption that we added to Shor’s hypothesis in order to get hard 
estimates for all our bounds: M > 2a, but not necessarily any larger. 


When we computed our lower bound on the probability of getting a y., we used an 
inequality that contained an angle ¢ under the assumption that @ was restricted to 
an interval wider than |—7, 7], spefically [—(3/2)z, (3/2)z]. The inequality we found 
applicable was 


700 


2 3. C8 
2Iel < 2/sin (5) | ' for d € 5x a : 
where K is the constant 
2sin os 


The key to our current predicament is to get an upper bound on measuring any (even 
the most likely) y,. This will lead us to get an inequality for general ¢ restricted to a 
narrower range, namely, |—7/2, 7/2]. This can be easily determined using the same 
graphing or calculus techniques, and is 


Figure 23.31: |sin(x/2)| lies above |LZa/z| in the interval (—.47147, .47147 ) 


sin (5) | F for 6 € l-3: =| ) 


2 
pith 2 9 
7 


where L is the constant 
_— 2sin 7 ~ 1.4142. 


Let’s see how this gives us the upper bound we seek. 


The probability we we wish to bound, this time from above, comes from the 
amplitude whose absolute value is 


a ee | 1 
VmN | ey —1 VmN 


We want an upper bound for the magnitude of the fractional term. To that end, we 
get an upper bound for the numerator and a lower bound for the denominator. 


eG. /4 ue 1 


TOL 


Upper Bound for Numerator 


The numerator is easy. We derived an upper bound for all angles, ¢, so: 


|_t@aci/a = l| oe Og. mM 
_ a 
Lower Bound for Denominator 
The denominator is 
|e* Mae/a = 1| = |ef Onde/ N) 2s) P 
but 27y. / N € (—1/2,7/2) because 
a es a 
i, eo SOF 
ee 2 
or 
a 27. Zs a 
Bp ae ap ee ag 
And since 
ea oP 
we get 
1 2 DTCs. if 1 
2 iia 2° 


This allows us to invoke the latest lower bound just mentioned for @ € [7/2, 7/2]: 


ois 2 |2ng./ N 
eie/N 4) > p2 PRIN 7 ogi ™ wy 141ar. 
T 4 
Re-applying the notation 
27ay 
Cp. = rG 


we can write 


eb 2tye / N if > ee |@5./a| 
T 


702 


Applying Both Bounds 
Combining the bounds for the numerator and denominator, 


O;. 7 / x2 _ lm 


et95./4 — 1 a “i Lr 


Finally, 


p ‘iced e i: 4a 2 m+1 41? 
(measurement yields y.) < AN gh 2 oa ee 

The last inequality still acknowledges that some of the m are m+ 1 and some are 
m. This time, we are only interested in the probabilities for each individual y,, not 
getting one of the set {y.}, so instead of summing all a of the y.s, we combine this, 
as is, with the earlier one, which I repeat here, 


1 4K?m? = m AK? 
mN wr? —~ N 7? 


P (measurement yields y.) > 


to get 


P (least-likely y,) a mk? 
P (most-likely y.) ~ (m+1)L?- 


Our assumption has been that a < M/2 so m = (integer quotient) N/a is > 2 (usually 
much greater). This, and the estimates for L ~ 1.4142 and kK & .4714, result in a 
ratio which is independent of a, M or N, between the probability of measuring the 
least likely y. to that of measuring the most likely y. 


P (least-likely y) _ mK? “ 2h? 
P (most-likely y-.) ~ (m+1)L? ~~ 32? 


~ 072. QED 


This covers the bullet about the ratio of the least likely and the most likely y.. 


Note-to-file. ifa<< M < N (i.e., m gets very large), as is often the case, Both 
K and L will be close to 1 (review the derivations and previous notes-to-file). This 
means all y, are roughly equi-probable. Also m/(m+ 1) = 1. Taken together, the 
ratio of the least likely to most likely is approximately 1. 

Summarizing, we have a hard minimum for the worst case scenario (only two 
intervals of size a fit into [0, M)) as well as an expected minimum for the more 
realistic one (a << M). That is, 

P (leaat-likely y.) .072, worst case: 3a > M 
P (most-likely y.) 


1l—e, typically 


703 


Proof of Second Bullet 


We will show that the probability of randomly selecting a number cb ais > 60%. 


First, we express this as a product of terms, each of which is the probability that 
a specific prime, pic (p divides evenly into c). 

Note that, a is an unknown: no one a is favored over any other a, a priori. Also, 
c is selected at random, by hypothesis of this bullet. So both a and c are considered 


selected at random with respect to any particular prime, p. We use this fact in the 
derivation. 


P(co a) =  P(There is no prime that divides both c and a) 
Bea (2|c A 2|a) A. (3)c A 3]a) Re (5|c A 5|a) Pe hows 


/\ = (pele A pela) IN eles his where p, = kth prime . 


eta = P(A -Glenviy) = TL Pt tleante) 


p prime p prime 
Since a(p|cA pla) is true for p > a or p > c¢, the probabilities for those higher primes, 
p, are all 1, which is why the product is finite. 


Next, we compute these individual probabilities. For a fixed prime, p, the probabil- 
ity that it divides an arbitrary non-negative c chosen randomly from all non-negative 
integers is actually independent of c, 


P (plc) =<. 


[Exercise. Justify this.] This is also true for pla, sO, 


P(p\cApla ) = 5 and 
1 
P(->(plcApja) ) = acs 


Remember, both a and c can be considered randomly selected relative to a fixed 
prime, p. If, instead, we restrict ourselves to non-negative values chosen randomly 
from a finite set, such as c < a (or a < M), then the probability that p|c is actually 
< 1/p. To see this, consider the worst cases, say a < p(P = 0) orp <a < 2p 
(P =1/a <1/p). So the above equalities become bounds which work even better for 


704 


US, 


1 

P aS, 

Bley ae 
P(pl\cApla) < 5 and 
1 
PCa (plenpia)). has 


Finally, we plug this result back into the full product, to get 


finite oS 
i 
P(coa) = I] P(>(p\lcApla)) = II =) 
p prime p prime P 


= ¢(2), 


where ¢(s) is the most famous function you never heard of, the Riemann zeta function, 
defined by 


p prime 


in Euler product form. The value of ¢(2) has to be handed-off to the mathematical 
annals, and we’ll simply quote the result, 


so we have shown that 
P(cha) > ¢(2) * 607. QED 


That proves our second bullet, which is all we will need, but notice what it implies. 
Since 


P(ncoa) <_ 398, 
after T random integer selections c,, C2, ..., Cr in the interval [0, a — 1], 
rea (-.$2) A (+2242) A... A (ser $a) ) 
< .3937, constant time, T, independent of N. 


This can be made arbitrarily small by choosing a large enough 7’, so we can expect 
to get a co a with arbitrarily high probability after T7 random samples from the set 
(0, a — 1]. 


705 


Combine Both Bullets to Arrive at Step V 


To reduce the syntax, it will help to introduce some notation. 


C = {yc} 
—_— {yw | ye €C and bo a} 
(Note: B CC.) 
|6| = the size (number of elements) in set B 
IC| = (same) = a 
.. |B 
q = the ratio =| 
IC| 
P(y.) =  P(we measure a specific y, € C) 


(Note: In the last definition, y, may or may not also be € B.) 


P(C) = P(we measure some y € C) 
P(B) 
P (BIC) 


P (we measure some y € B) 


P(we measure some y € B 
given that 


the measured y is known to be € C) 


We would like a lower bound on the probability of measuring a y, which also has the 
property that its associated c is coprime to a. In symbols, we would like to show: 


Claim: P(B) > some constant independent of a, M, N, etc. 
Proof: 


P(B) = P(BIC) P(C) 
= (= P(y) | >, Pv) P(C) 
bE B cEC 
Let 
Yamin = Y € B with P(ypmin) minimum over B 
YBmax = Y € B with P(ypmax) Maximum over B 


Yomin, YCmax Same, except over all of C 


If there is more than one y that produce minimum or maximum probabilities, choose 


706 


any one. Then, 


|B P (yamine) 
S~ Ply) ee) > BLP Wenn) 
(= cEC IC| P'( vows) 
5 BLP (wenn) 
IC| P (vomee) 

ge AID 


P(B) > (4 x 072) P(C). 


From the proof of the second bullet of this step, g > .607, and from Step IT P(C) > 
.04503, so 


P(B) > 607 x .072 x .04503 & .002. 


This is independent of a, M,N, etc. and allows us to apply the CTC theorem for 
looping algorithms to aver that after a fixed number, 7’, of applications of the quantum 
circuit we will produce a y, with c : a with any desired error tolerance. We’ll compute 
T in a moment. 


Remember that we used a worst case bounds above. As we demonstrated, nor- 
mally the ratio .072 is very close to 1, and P(C) > .40528, so we can expect a better 
constant lower bound: 


P(B) > 607 x 1x 40528 = .266. 


Conclusion of Step V 


After an adequate number of measurements (independent of a, M/), which produce 
Yer, Yoo: +++) Yer; We can expect at least one of the y., = ye to correspond to c/a, 
with cb a with high probability. 


Examples that Use Different Assumptions about a 


How many passes, 7’, do we need to get an error tolerance of, say ¢ = 10~® (one in 
a million)? It depends on the number of times our period, a, fits into the interval 
(0, N). Under the worst case assumption that we formally required — only two — we 
would need a much larger number of whacks as the circuit than in a typical problem 
that fits hundreds of periods in the interval. Let’s see the difference. 


The “p of success” bounded away from 0 for a single pass of our algorithm’s loop 
in the CTC theorem, along with the error tolerance ¢€, gives us the number of required 
passes, the formula provided by the theorem, 


Se oy Oe 


107 


Worst Case (P(B) = .002): We solved this near the end of the probability 


lesson and we found 
log (107°) 
1 = + 1 = 6901, 
be 


or, more briefly and conservatively, 7000 loop passes. 


Typical Case (P(B) & .266): This was also an example in the earlier 


chapter, 
-6 
T = ES | os ate ae, 


log (.734) 
or, rounding up, 50 loop passes. 


It’s important to remember that the data’s actual probability doesn’t care 
about us. We can instruct our algorithm to cycle 7000 times, but if the data 
determined that only 15 loop passes were needed to find the desired c/a, then it 
would return with a successful c/a after 15 passes. Our hyper-conservative estimate 
costs us nothing. 


On the other hand, if we are worried that a < N/2 is too risky, and want to allow 
for only one period of a fitting into M or N, the math works. For example, we could 
require merely a < .999N and still get constant time bounds for all probabilities. Just 
repeat the analysis replacing 1/2 with .999 to find the more conservative bounds. You 
would still find that proofs all worked, albeit with P(B) > a constant much smaller 
than .002. 


23.10.11 Algorithm and Complexity Analysis (General Case) 
Why Finding y. € B Solves Shor’s Problem 


In the final step of the previous section we proved that we will measure y, € B with 
near certainty after some predetermined number, T' of measurements. Why does such 
a y- solve Shor’s period-finding problem? CFA returns n/d = c/a close to y./N, but 
the actual numerators and denominators do not have to match: n/d is reduced, but 
c/a may not be for general c. However, we are safe when y, € 8, for then, co a, and 
we are assured that c/a is a reduced fraction. In that case, we can assert that the 
numerators and denominators match; for the denominators, that means d = a and 
we are done. 


This happens if we measure y € B, and this is why finding such a y will give us 
our period and therefore solve Shor’s problem. (Wait. Might we end up getting d = a 
even if we happened to measure some y ¢ B? In other words, is the if in the last 
sentence if and only if or just af? Well, there are two ways to deal with this question. 
You could chase down the logic to see whether a y ¢ B used to send x = y/N to the 
CFA could possibly give us d = a, but why bother? I prefer the second method to 
dispatch the question: Our algorthm will test d = a after every measurement, so if 
(possible or not) we found a for the wrong reason, we still found a.) 


708 


Shor’s Period-Finding Algorithm, General Case 


After some hard work we have assembled all the results. Two things need to go right 
to be certain we measured a y € B: 


1. We measure y € C = {yc}. 


2. the associated c satisfies co a. 


We could fail in either or both, but the test of overall success is that the fraction, 
n/d, returned, by CFA(y,./N, 1/(2M?) ) has the property that d = a. And we can 
test that in a single evaluation of our given function f. (Remember f? It was given 
to us in Shor’s hypothesis, and we used it to design/control U,). 


Testing whether, say, f(1 +d) = f(1) is enough to know whether d is the period, 


The complexity of this test may or may not be polynomial time, depending on f. 
For RSA encryption-breaking the f in question is O(log*(/)), as we will prove, so for 
that application we will have absolute speed-up over classical solutions. In general, 
this step confines our quantum period-finding to only relativized speed-up. 


So, the short description of the algorithm is this: 


- Run the circuit producing y,s, use those to get (n/d)s, stop when we 
confirm f(1 +d) = f(1), report that d= a (our period) and declare 
victory. 


- In the unlikely event that we run out of time (exceed our established 
T), we admit defeat. 


The full version follows. 


e Select an integer T that reflects an acceptable failure rate based on any known 
aspects of the period. (E.g., for a failure tolerance of .000001, we might choose 
T = 7000 if we expect only 2 periods, a, to fit into [0, M —1) or T = 45 if we 
know a << M. If we are happy with failure at .001, then these values would 
be adjusted downward. ) 


e Repeat the following loop at most T' times. 


. Apply Shor’s circuit. 

. Measure output of OFT to get y. 

. Apply CFA to y/N and 1/(2M7?) to produce n/d. 

. Test d: If f(1 +d) = f(1) then d = a, (success) break from loop. 


. Otherwise continue to the next pass of the loop. 


oF WwW NY FR 


e If the above loop ended naturally (i.e., not from the break) after T full passes, 
we failed. Otherwise, we have found a. 


709 


Computational Complexity (General Case) 


(This is nearly word-for-word identical to the easy case.) 


We have already seen that O(log(V)) = O(log(V)) as well as any powers of the 
logs, so I will use M in the following. 


e The Hadamard gates are O(log(M)). 
e The QFT is O(log*(M)). 


e Complexity of U; is same as that of f, which could be anything. We don’t count 
that here, but we will show in a separate lecture that for integer factoring, it is 
certainly at least O(log* M). 


e The outer loop is O(T) = O(1), since T is constant. 
e The classical CFA sub-algorithm is O(log*(M)). 


e The four non-oracle components, above, are done in series, not in nested loops, 
so the overall relativized complexity will be the worst among them, O(log?(M)). 


e In the case of factoring needed for RSA encryption breaking (order-finding) 
the actual oracle is O(log*(M)) or better, yielding an absolute complexity of 
O(log*(M)) 


So the entire Shor circuit for an f € O(log*(M)) (true for RSA/order-finding) would 
have an absolute complexity of O(log’(M)). 


23.10.12 Epilogue on Shor’s Period-Finding Algorithm 


Other than a discussion of Euclid’s algorithm for computing the greatest common 
divisor and some facts about continued fractions (both covered in this volume) you 
have completed a rather in-depth development of Shor’s periodicity algorithm for 
quantum computers. 


You studied the circuit, the algorithm and computational complexity for quantum 
period-finding. 

There remains the question of how quantum period-finding can be applied to RSA 
encryption-breaking, which is a form of “order-finding.” Before we close out this first 
course, I’ll cover that as well. 


710 


Chapter 24 


Euclidean Algorithm and 
Continued Fractions 


24.1 Ancient Algorithms for Quantum Computing 


The Euclidean algorithm is a method for computing the greatest common divisor 
of two integers. The technique was described (although probably not invented) by 
the Greek mathematician Euclid in about 300 B.C. Two thousand years later it 
continues to find application in many computational tasks, one of which is Shor’s 
quantum period finding algorithm. In fact, we apply it twice in this context, once 
directly, then again indirectly when we apply a second, centuries old technique called 
continued fractions. 


In this lesson, we will study both of these tools and compute their time complex- 
ities. 


24.2 The Euclidean Algorithm 


24.2.1 Greatest Common Divisor 


The Euclidean Algorithm (EA) takes two positive integers, P and Q, and returns the 
largest integer that divides both P and Q evenly (i.e., without remainder). Its output 
is called the greatest common divisor of P and Q and is written as a function of its 
two input integers, gcd(P,Q). Of special note, we will learn that 


e EA(P,Q) € O(log? X), where X is the larger of the two integers passed to it 
(although sharper/subtler bounds exist), and 


e EA will be used as a basis for our Continued Fractions Algorithm, CFA. 


rela 


24.2.2 Long Division 


A long division algorithm (LDA) for integers A and B, both > 0, produces A+ B in 
the form of quotient, q and remainder, r satisfying 


A = qB4+¥r. 


The big-O time complexity of LDA(A, B) in its simplest form is O(log? X), where X 
is the larger of {A, B}. If you research this, you'll find it given as O(N7), where N is 
the number of digits in the larger of {A, B}, but that makes N = log,, X, and logy, 
has the same complexity as log, . 


24.2.3. The Euclidean Algorithm, EA(P, Q) 


Although the final result is independent of which number occupies the first parameter 
position, we will assume P is first, as this will affect the intermediate calculations 
which are used by CFA, coming up next. 


I will describe the algorithm without proof — it is well documented and very easy- 
to-follow in all the short descriptions you will find online. 


General Idea 


To produce EA(P,Q), for P,Q > 0, we start by applying long division to P,Q. 
LDA(P, Q) returns a quotient, g, and remainder, r, satisfying 


PS. GQ se or: 


Notice that either r = 0, in which case Q|P and we are are done (gcd(P, Q) = Q), 
or else we have two new integers to work with: @ and r, where, now, Q > r. We 
re-apply long division, this time to the the inputs Q and r to get Q +r, and again, 
examine the new remainder (call it r’), 


Q = ¢drt+r. 


Like before, if r’ = 0 we are done (gcd(P,Q) =r), and if not, we keep going. This 
continues until we get a remainder, 7 = 0, at which point the gcd is the integer 
“standing next to” (being multiplied by) its corresponding quotient, g, in the most 
recent long division (just as Q was “standing next to” q in the initial division, or r 
was “standing next to” q’ in the second division). 

Let’s add some indexing to our “general idea” before we give the complete algo- 
rithm. Define 


ro = P, 
i Se). 


2 


and 


qo =. the first quotient of P+ Q, 
the first remainder of P + Q. 


i) 
Using this notation, the initial division becomes 
ro = Go: + 12. 
If rg = 0, return gcd = qo, otherwise, r; > ro > 0, and we repeat, to compute r; +12, 
Tr = Qi: fe2 + 13. 


Continue until r, = 0, at which point return gcd = rz_1. 


24.2.4 The Algorithm 
EA(P, Q): 
e Initialize 


fo? Se PP 
rr = Q 


e Loop over k, starting from k = 0 
— Compute rz + rp41 using LDA(rz, 7,41) to compute q and rp+2 
Tr = Qk*Tr1 1+ Tr+2- 
— until rpr2 = 0 


e Return gcd = rp44 


713 


Example 


W3 use EA to compute gcd(285, 126) 


To = 285 
ry = 126 
TQ SS GG FL. a OES 
285 = 2-126 + 33 
ro #0, so compute Tr; +12 
is OM eR! | Woe i al 
126 = 3-33 + 27 
rg #0, so compute rg + r3 
TO. JS QS 80S. Se Pa 
33°) Se OF + 6 
r3 #0, so compute r3 +14 
T30 = G3°:T4 +r 15 
27 = 4-6 4+ 3 
r4 #0, so compute r4 +15 
Ta4 = Qa'T5 +r T6 
6 = 2-3 + 0 
rs = 0, so return gcd = 14 
Bcd, = 3 


24.2.5 Time Complexity of EA 
This is easiest seen by looking at the very first division, 
P= @qQ +r 

Notice that 
| ae 
5 
To see this, break it into two cases. 

e case i) Q < P/2. In this case, since r < Q, we have r < P/2. 

e case ii) Q > P/2. In this caseeq=Oandr=P—-—Q < P-—P/2 
This same logic, applied to the iterative division 


Thr = ke Tre + Pro, 


714 


P/2. 


gives 

Vk 

a 

In other words, every two divisions, we have reduced r;, by half, forcing the even- 
indexed rs to become 0 in at most 2 log P iterations. Therefore, the number of steps 
is O(2log P) = O(log P). (Incidentally, the same argument also works for Q, since 
Q = 1r1, spawning the odd-indexed rs. So, whichever is smaller, P or Q, gives a tighter 
bound on the number of steps. However, we don’t need that level of subtlety.) We 


have shown that the EA main loop’s complexity is O(log X), where X is the larger 
(or either) of P, Q. 


Next, we note that each loop iteration, calls the LDA(rp, rp4.1), which is O(log? rz), 
since rp > Try1. For all k, ry < X (larger of P and Q), so LDA(rg, 7x41) is also in 
the more conservative class, O(log X). 


TRe2 << 


Combining the last two observations, we conclude that the overall complexity of 
EA(P, Q) is the product of complexities of loop, O(log X), and the LDA within the 
loop, O(log? X), X being the larger of P and Q. Symbolically, 


EA(P,Q) € O(log? X), 
X = larger of {P,Q}. 


24.3  Convergents and the Continued Fraction Alorithm 


24.3.1 Continued Fractions 


Although we won’t deal with such dizzying entities directly, you need to see an ex- 
ample of at least one continued fraction so we can define the more useful derivative 
(convergents) and present the CFA algorithm. A continued fraction is a nested con- 
struct (possibly infinite) of sums and quotients of integers in the form 


ago 


1 


a4 + 
We will only consider ap > 0, and a, strictly positive for all k > 0. 


Notice that the sequence of integers { ao, a1, a2, ... } completely defines this object 
because of its special form: all numerators = 1 and all (except for possibly the last) 
denominators the sum of exactly two terms, an integer part, a,, and its fractional 
partner, 1 over “something.” 


I said that some continued fractions are finite. When that happens there is a final 
ax alone in its denominator, terminating the continued fraction. 


715 


The headlines are: 


e Any real number, 2, including both rationals (29/48, 7, 8765/1234, etc.) and 
irrationals (7, V2, e?”, etc.), can be expressed as a continued fraction. (We only 
care about positive x.) 


e When z is rational the continued fraction is finite. When z is irrational the 
continued fraction is infinite (it “pencils out”’ to the irrational x but takes an 
infinite amount of graphite to write). 


A Rational Example 


An example of a rational number expressed as a continued fraction is (check it your- 
self): 


285. | 1 
126 1 


1 


4 + : 
> 


You might ask why we bother with a continued fraction of an (already) rational num- 
ber, x. The special form of a continued fraction leads to important approximations 
to x that are crucial in our proof of the quantum period-finding algorithm. 


An Irrational Example 


A famously simple irrational continued fraction is 


i 
2 


Dds 


I will not prove any the above claims or most of what comes below. The derivations 
are widely and clearly published in elementary number theory texts and short web- 
pages on continued fractions. But I will organize the key facts that are important to 
Shor’s Algorithm. 


24.3.2 Computing the CFA a;s Using the EA if x is Rational 
Before we abandon the ugly nested construct above, let’s look at one easy way to get 


the a, if x is rational. This will tell us something about the computational complexity 
of our upcoming continued fractions algorithm (CFA). 


716 


CFA Using the EA. /f 

Ge ae 
Q 
the {az} in its continued fraction expansion are exactly the unique {gq} 


of the Euclidean Algorithm, EA(P,Q), for finding the gcd( P,Q). 


In other words, we already have an algorithm for expanding a rational number as a 
continued fraction. It’s called the Euchdean algorithm and we already presented it. 
We just grab q, from EA and we’re done. 


(That’s why I made a distinction between the first parameter, P and the second, 
Q, and also why I labeled the individual q, in the EA, which we didn’t seem to need 
at the time.) 


Time Complexity for Computing a Rational (Finite) Continued Fraction 


Theorem. The time complexity for computing all the ax for a rational x 
is O(log? X), where X is the larger of {P,Q}. 


Proof: Since the {a;,} of continued fractions are just the {q} of the EA, and 
we have proved that the EA € O(log® X), where X is the larger of P and Q, the 
conclusion follows. QED 


24.3.3 A Computer Science Method for Computing the a;s 
of Continued Fractions 
An iterative algorithm is typically used when programming CFA. We use the notation 


|x| = greatest integer < x. 


The CF Method 


e Specify a termination condition (e.g., a maximum number of loop passes or a 
continued fraction within a specified ¢ of the target x, etc.). 


e Loop over k& starting at k = 0 and incrementing until we reach the maximum 
number of loop passes or we break sooner due to a “hit” detected in the loop 
body. 

li ap < ea 

2. frac + x-—|a| 

3. If frac = 0, break from loop; we’ve found x. 
4.x ¢ 1/frac 


e Return the sequence {az}. 


We 


24.3.4 Convergents of a Continued Fraction 
Algorithmic Definition of Convergents 


Rather than using the unwieldy continued fractions directly, we’ll work with the much 
more tractable “simple fractions” which derive from them. 


The kth Convergent of x. Jf we forcibly stop the loop in the CF 
method after k iterations, we produce (and return) a number which is only 
an approximation of x. This approximation is called the kth convergent 


of x. 


Of course if x is irrational, we can only ever get an approximation, so this is not 
big news. But if x is rational, and we halt the process prematurely, before a, = 
ax the final denominator, then we only have an approximation of x even though 
we might have been more patient and waited to generate the full expression of x, 
itself. Surprisingly, these approximations are sometimes more useful than the whole 
enchilada. Certainly, for our purposes, they will turn out to be. 


When we fully evaluate the kth convergent it will, of course, resolve to a fraction 
in its own right, and we always express that fraction in reduced form. 


Algebraic Definition of Convergents 


We could have defined the kth convergent of « when we first introduced continued 
fractions. They are just the partially completed constructs that we see when writing 
out (or attempting to write out) the full continued fraction. Explicitly, the first few 
convergents are 


mo 
ds 0; 
Me nde oat in 
dy y ay 
ng 1 
= = a + 
dy 1 
Cis a SS 
a2 
and generally, 
Nk = . ’ 1 
es a 
dr ao 1 
1 
Ar-1 + ie 
k 


718 


Example 


For the rational x = 285/126, whose continued fraction and {a; } we computed earlier, 
you can verify that the convergents are 


Mm — 2 
dy ~ I 
m — 7 
a> 
nm 9 
7 @ 
a = a and 
d3 19 
na 95 
dy 42 


Notice that the final convergent is our original x in reduced form. This is very impor- 
tant for our purposes. Figures 24.1 and 24.2 show two graphs of these convergents at 
different zoom levels. 


x = 95/42 


4 1 


Figure 24.1: A wide shot of the of the convergents starting at 2/1 


x = 95/42 


9 3 


Figure 24.2: A close-up showing some of the latter convergents. 


A couple features might pop: 


719 


1. The convergents “converge” very fast — every two convergents are much closer 
together than the previous two, and 


2. The convergents bounce back-and-forth around the target, x, alternating less- 
than-x values and greater-than-x values. 


Before making this precise, let’s see if it holds for a second example. 


Example 


For the rational x = 11490/16384 the convergents are: 


NI] Ot WIND FILrR FIO 


ol“ 


5745 11490 
8192 16384” 


This time, we’ll need more zoom levels because the later convergents are so close to 
x they are impossible to see. Figures 24.3 through 24.7 show various stages of the 


convergents. 


x = 5745/8192 


Figure 24.3: First view of convergents 


720 


x = 5745/8192 


Figure 24.4: Second view of convergents 


x = 5745/8192 


(< —— — — —= « 


3395 
10 2193 


Figure 24.5: Third view of convergents 


x = 5745/8192 


Figure 24.6: Fourth view of convergents 


x = 5745/8192 


Figure 24.7: Fifth view of convergents 


721 


24.3.5 An Algorithm for Computing the Convergents {n;/d;} 


Specify a termination condition (e.g., a maximum number of loop passes or a 
convergent within a specified ¢ of the target x, etc.). 


Invoke the CF method, given earlier, that produces the entire sequence {a;} to 
the desired accuracy. (Note: The CF method can be merged into this algorithm 
by combining loops carefully.) 


Set the first two convergents, manually: 


No — ao, do « | 
ny — ayag + 1, dy —- a 


Loop over k starting at k = 2 and iterating until k = K, the final index of the 
sequence {a;} returned by the CF method. 


Le ty SS RNR RS 


2. dy, — apdp—1 + dpe 


Return the sequence {n;/dx}. 


24.3.6 Easy Properties of the Convergents 


The convergents {n;/d;,} have the following properties which are derived in most 
beginning tutorials in very few steps. 


1 


For any real number, x, the convergents approach «x, 


. For rational x, the above limit is finite, i.e., there will be a K < o, with 


nx /dx = x exactly, and no more fractions are produced for k > K. 


. They alternate above and below 2, 


>a, if kis odd 


= (assuming {n;,/d,} A x). 
<a, ifk is even 


Nk 
dy 


. Each n;/dx (if not the final convergent which is exactly x) differs from x by no 


more than 1/(di,dx+1), 


5. For k > 0,nx/dx is the best approximation to x of all fractions with denominator 
ss dx, 


Nk n 
di, 


7 , for alld < dy. 


9 od 


< |e - 


6. Consecutive convergents differ from each other by exactly 1/(dpdy+1), 


7. The denominators {d;,} are strictly increasing and, if x is rational, are all < the 
denominator of x (whether or not x was given to us in reduced form). 


8. When x = P/Q is a rational number, computation of the all convergents is 
€ O(log? X), where X is the larger of {P,Q}. (This follows from the fact that 
the convergent algorithms above are based on EA as well as the details of the 
loops used.) 


There is one not-so-easy-to-prove fact that we will need. It can be found in An 
Introduction to the Theory of Numbers by Hardy and Wright (Oxford U. Press) as 
Theorem 184. The proof is rather involved. 


e If a fraction n/d differs from x by less than 1/(2d”) then n/d will appear in the 
list of convergents for x. Symbolically: 


If 


then 


K 
n Nko Nk 
- = — € — the convergents for x) . 
d reg dk es 
24.3.7 CFA: Our Special Brand of Continued Fraction Algo- 
rithm 
Our version of a convergent-generating algorithm, which I call CFA, will take as input 
parameters a rational target x and requested degree of accuracy «. CFA(z,¢) will 
return n/d, the first convergent (i.e., that with the smallest index k) to x within ¢ 


of x. It simply wraps the previous algorithm into an envelope that returns a single 
fraction (rather than all the convergents). 


1. To our previous algorithm for generating the convergents, pass x along with the 
terminating condition that it stop looping when it detects that |a —n,/d,| < e. 


723 


2. Return nz/dp. 


That’s all there is to it. 


Depending on ¢ and x, CFA(z,¢) either returns n/d = nx/dx = 2, exactly, as its 
final convergent or an ¢-approximation n/d #4 x, but within ¢ of it. 


724 


Chapter 25 


From Period-Finding to Factoring 


25.1 Period Finding to Factoring to RSA Encryp- 
tion 


RSA encryption relies on an interesting computational and mathematical fact. Given 
large enough positive integer, M = pq, which is the product of two primes, it would 
take the world’s fastest super computer billions of years to compute the two unknown 
factors p and q. How large is large enough? An M with around 28 decimal digits 
would work. 


You'd have to take a course in classical cryptography — or find a tutorial on the 
Web —to learn why the inability to factor large numbers leads to encryption and how 
the RSA works. Our task today is to learn how a quantum computer can solve the 
same factoring problem for such a large number efficiently (potentially in minutes or 
seconds). We have, in our quill, a very powerful arrow: Shor’s quantum algorithm 
for period-finding. Let’s see how a hypothetical quantum computer could leverage 
it to overcome the computational limitation to which a classical super computer is 
subject. 


25.2 The Problem and Two Classically Easy Cases 


The Factoring Problem. Given some M € Zyxo that we know is not 
prime, find aq € Zyq that satisfies q|M. 


[Variable Name — Most number theory publications use the letter N for the 
large number to be factored. I have been using N to be the power-of-2 larger than 
M?, where M is our bound on the period a and the number we want to factor, so I 
will use M for the arbitrary integer in this section.| 


We will want to narrow down the kind of MV that we’re willing to look at, and there 
are two cases which can be tested and factored by ordinary computers efficiently, 


725 


1. M even, and 


2. M=p*, k>1, is a power of some prime. 


Why the Two Cases Are “Easy” 


The test “M even?” entails a simple examination of the least significant bit (0 or 1), 
trivially fast. Meanwhile, there are easy classical methods that determine whether 
M = p* for some prime p and produce such a p in the process (thus providing a 
divisor of /). 


In fact, we can dispose of a larger class of M: those M for which M = q*, k >1 
for any integer g < M, prime or not, and produce q in the process — all using classical 
machinery. If we detected that case, it would provide a factor g, do so without 
requiring Shor’s quantum circuit, and cover the more restrictive condition 2, in which 
q is a prime. 

So why does the second condition only seek to eliminate the case in which M is 
a power of some prime p before embarking on our quantum algorithm rather than 
using classical methods to test and bypass the larger class of M that are powers of 
any integer, q? First, eliminating only those M that are powers of a single prime 
is all that the quantum algorithm actually requires. So once we have disposed of 
that possibility, we are authorized to move on to Shor’s quantum algorithm. Second, 
knowing we can move on after confirming M 4 p*, for a p prime, gives us options. 


e We can ask the number theorists to provide a very fast answer to the question 
“is M = p*, p prime,” and let the quantum algorithm scoop up the remaining 
cases (which include M = q*, q not prime). 


e Alternatively, we can apply a fast classical method to search for a q (prime or 
not) that satisfies M = q*, thus avoiding the quantum algorithm in a larger 
class of M. 


One of the above two paths may be faster than the other in any particular {hardware 
+ software} implementation, so knowing that we can go either way gives us choices. 


Now let’s outline why either of the two tests 
M=¢, k>1 ot 
M =p", k>1, p prime 


can be dispatched classically. 


Classical Algorithm for Larger Class: M =q*, k>1 


Any such power k would have to satisfy k < log, M since p > 2 (we’ve eliminated 
even). Therefore, for every k < log, M we compute the integral part 


=| 


726 


(something for which fast algorithms exist) and test whether g* = M. If it does, ¢ is 
our divisor, and we have covered the case M = q*, k > 1, for any integer gq without 
resorting to quantum computation. The time complexity is polynomial fast because 


e the outer loop has only log, M passes (one for each k), 


e even a slow, brute force method to compute q = lv M | has a polynomial 
big-Oh, and 


e taking the power q* is also polynomial fast. (Moreover, the implementation of 
previous bullet can be designed to absorb this computation, obviating it.) 


[Exercise. Design an algorithm that implements these bullets and derive its 
big- Oh. ] 


Classical Algorithm for Smaller Class: M = p*, k>1, p Prime 


But if one wanted to also know whether the produced gq in the above process was 
prime, one could use an algorithm like AKS (do a search) which has been shown to 
be log polynomial, better than O(log®(#digits in M)) = O(log’(logM)), in M. 

This approach was based on first finding a qg with M = q*, then going on to 
determine whether q was prime. That’s not efficient, and I presented it only because 
the components are easy, off-the-shelf results that can be combined to prove the 
classical solution is polynomial fast. In practice we would seek a solution that tests 
whether M is a power of a prime directly, using some approach that was faster than 
testing whether it is a power of a general integer, q. 


Why We Eliminate the Two Cases 


The reason we quickly dispose of these two cases is that the reduction of factoring 
to period-finding, described next, will not work for either one. However, we now 
understand why we can be comfortable assuming / is neither even nor a power of a 
single prime and can proceed based on that supposition. 


25.3 A Sufficient Condition 


The next step is to change the the factoring problem into a proposition with which 
we can work more easily. 


Claim. We will obtain a q|M if we can find an x with the property that 


z* = 1 (mod M), for 
xz # +1 (mod M). 


rexG 


Proof 


x? = 1 (mod M) 
=> 
x’? —1 = 0 (mod M) 
=> 
M | (2? - 1) 
=> 
M | (@ — 1)(@ + 1). 


That can only happen if M has a factor, p > 1, in common with one or both of (a —1) 
and (x +1), ie., 

p|M and p\(x —1) or 

p|M and p\(x+1). 


That factor, p, cannot be M, itself, for if it did, either 


M|(x — 1), contrary tox #A +1 (mod M) or 
M|(c+1), contrary tox # -1 (mod M), 


both, outlawed by the hypothesis of the “claim”. Define 


7 gcd(M, x—1), if common factor p(x —1) or 
~~ | ged(M, +1), if common factor p|(x + 1). 


Whichever of the above two cases is true (and we just proved at least one must be), 
we have produced a q with q\M. QED 


The time complexity of ged(M,k),M > k is shown in another lecture to be 
O(log’ M), so once we have x, getting q is “fast.” 


Before we find x, we take a short diversion to describe something called order — 
finding. 


25.4 A Third Easy Case and Order-Finding in Z) 


Pick a y at random from Zy — {0, 1} = {2, 3, 4,..., M—1}. Either y will be 
coprime to M (y > M) or it won't. 


e If 7 (y¢ M) we’re done, because gq = gcd(M,y) will be our desired factor of 
M (and we don’t even need to look for an x). An O(M?) application of GCD 
will reveal this fact by either returning a q > 1 (we’re done) or result in 1 (we 
continue). 


728 


e If yo M we go on to find the “order” of y, defined next and which leads to the 
rest of the algorithm. 


We therefore assume that y > M, since if it were not we would have lucked upon the 
first bullet and factored M — that would be a third easy case we might encounter. 
While it may not be obvious, finding the “order” of y in Zjy will be the key. In 
this section we define ”order” and learn how we compute it with the help of Shor’s 
quantum period-finding; the final section will explain how doing so factors M. 


Order. The order of y € Zy, also called the order of y (mod M ), is 
the smallest positive integer, b > 1 such that 


y’ = 1 (mod M). 
Why are we so sure such an b > 1 exists? Consider all powers, y*, k = 1,2,3,.... 
Since Zy, is finite, so is 
{y* (mod M) bon (being a C Zy), 
and there must be many (infinitely many) distinct pairs k, k’ with 
y® = y* (mod M). 
For each pair, take k’ to the the larger of the two, and write this last equality as 
ko k+b = ! 
Oe tg (mod M), b = kK -k > 0. 


We just argued that there are infinitely many k > 1 (with potentially a different b > 0 
for each k) for which the above holds. There must be a smallest b that satisfies this 
among all the pairs. (Once you find one pair, take the k and 6 for that pair. Keep 
looking for other pairs with different ks and smaller bs. You can’t do this indefinitely, 
since eventually you’d reach b = 0. It doesn’t matter how long this takes — we only 
need the existence of such a b, not to produce it, physically.) Assume this last equality 
represents that smallest b > 0 for any k which makes it true. That means, there exists 
a pair with 


y* — y**® = 0 (mod M),  b minimal over all k. 
Factoring, 


y* (1 — y’) = 0 (mod M) 
= 

M | yey) 

= 


M | (san) 


729 


The last step holds because y* M, since we are working inside the major case in 
which we were unlucky enough to pick a y M. We're done (proving that an order b 
of y exists) because the final equality means 


yo = 1 (mod M). 
Define 
a = 641, 
which implies (plug into above to see why) 
y° = y (mod M), a minimal. 
That means the function 
f(x) =y* (mod M), 


built upon our randomly selected y, is periodic with period a. Furthermore, it is 
Zm—periodic, which implies the extra condition 


f(a’) # f(x) whenever x—2’ < a. 


(Review the definition of b and its minimality to verify this extra condition.) 


We'll also be using the fact that the period, a, is less than M. Here’s why we can 
assume so. The order of y, b = a — 1, has to divide M, because the order of every 
element in a finite group divides evenly into the size of the group, M in this case (see 
elementary group theory, if you’d like to research it). So, either b = M or b < M/2. 
In the former case, a = M —1, and we dispose of that possibility instantly by testing 
whether M — 1 is the period of f(x) = y” (mod M) by evaluating it for any x and 
x+(M —1). That only leaves the caseeb< M/2 > a<M. 


Enter Quantum Computing — We have a Z),,—periodic function, f(x) = y* 
(mod M), with unknown period, a < M. This is the hypothesis of Shor’s quantum 
algorithm which we have already proved can be applied in log? M time. This is exactly 
where we would use our quantum computer in the course of factoring M. 


We have picked a y € Zy — {0, 1} at random, defined a function based on that 
y, and found its period, a. That gave us the order of y in Zy. The next, and final, 
step is to demonstrate how we use the order, a, to factor M. 


25.5 Using the Order of y to Find the «x of our 
“Sufficient Condition” 


We’ve already established that if we can find an x that satisfies 


zx? = 1 (mod M), 
sei" (mod): 


730 


and do so efficiently (in polynomial time), we will get our factor q of M. We did 
manage to leverage quantum period-finding to efficiently get the order of a randomly 
selected y € Zjy, so our job is to use that order, a, without using excessive further 
computation, to factor MM. There are the three cases to consider: 


1. a is even, and y?/? 4 —1 (mod M). 
2. a is even, and y*/? = —1 (mod M). 


3. a is odd. 


Case 1: Even a, with y*/? 4 —1 (mod M) 


Claim. 


satisfies our sufficient condition. 


Proof 


* = y*, where a= order of y (mod M), so 
xz? = 1 (mod M). 


That’s half the sufficient condition. The other half isa 4 +1 (mod M). Our 
assumption in this case is that 


x # -1 (mod M), 
so we only need to show that 
x # +1 (mod M), 


and we'll have shown that this x satisfies our sufficient condition. Proceed by contra- 
diction. What would happen if 


x = +1 (mod M)? 
Then 
y/? = 41 (mod M), ie., 
y@/2)+1 — y (mod M). 
If a = 2, then 
y? = y (mod M), ie., 
y = 1 (mod M), 


731 


which contradicts that y € Zyy — {0, 1} (1 is not in that set). So we are forced to 
conclude that a > 2, which gives 


a 
Sie ae wl 
: , so 
AT, Oo hee MG 
2 2° 2 
But now we have an a’ = (a/2) + 1 < a that satisfies 
y® = y?+1 = y (mod M). 


That contradicts that a is the order of y (mod )M, because the order is the smallest 
integer with that property, by construction. QED 


That dispatches the first case; we have found an x which satisfies the sufficient 
condition needed to find a factor, g, of M. 


Cases 2 and 3: Even a, with y*/? = —1 (mod M) or Odd a 


I combine these two cases because, while they are both possible, we rely on results 
from number theory (one being the Chinese Remainder Theorem) which tell us that 
the probability of both cases, taken together is never more than .5, i.e., 


1 
P(case2 V case3) < 5° 


This result is independent of M. That means that if we repeatedly pick y at random, 
T times, the chances that we are unlucky enough to get case 2 or case 3 in all T trials 
is 


* at 1 
°(A case 2 V cn < IIs = ar 
k=1 k=1 
Therefore, we end up in case 1 at least once in T trials with probability 1 — 1/(27), 
independent of M, i.e., in constant time. This means we efficiently (constant time) get 
case 1, and once we have that, we use Shor’s period-finding algorithm on a quantum 
computer, which is O(log?(M)), to compute the period and produce x. After we have 
x, another O(log?(M)) algorithm, GCD (on an ordinary computer), gets us M. 


25.6 The Time Complexity of f(x) = y” (mod M) 


There is one final detail we have not addressed. Shor’s quantum period-finding was 
only as fast as its weakest link. Since the algorithm is log? M only counting the logic 
exterior to the quantum oracle, Ur, then any f which has a larger growth rate would 
erode Shor’s performance accordingly. In other words, it has relativized exponential 
speed up. If it is to have absolute speed-up over the classical case we must show that 
the oracle, itself, is polynomial time in M. I’m happy to inform you that in the case 
of the factoring problem, the function f(x) = y” (mod M) is, indeed, log® M. 


732 


25.7 The Complexity Analysis 


y € Zy and we also saw that a < M, so we only need to consider computing y®* for 
both y,z < M. Since M < M? <= N = 2”, we can express both x and y as a sum 
of powers-of-2 with, at most, n = log N terms. Let’s do that for x: 


n-1 
Lc = y 2 
k=0 


where the x; are x’s base-2 digits. So (all products taken mod-M) 


rp2* n—-1 
i yr _— lly” 
k=0 


However long it takes us to compute the general factor, yrkr we need to repeat it 
n times, and when those n factors are computed, we multiply them together using 
(n — 1) multiplications. There are two parts, taken in series, to this computation: 


sles Yee [complexity of y®*2" (mod M) 
2. (n — 1) x [complexity of a multiplication (mod M)| 


The slower (not the product) of the two will determine the overall complexity of y” 
(mod M). 


Preliminary Note. The computational complexity of integer multiplication is 
O(log? X), where X is the larger of the two numbers. For us, each product is bounded 
by N > M, so integer multiplication costs us, at most, O(log? N). 


25.7.1 Complexity of Step 1 


yon = (y")" 


and in the process of computing ye we would end up computing y?, so, in order 
to avoid repeated calculations from factor-to-factor, we first compute an array of the 
n factors 


n—-1 
{y* = {1 Ue Ge Gee Win eae a all mod-M. 
Starting with the second element, y, each element in this array is the square of the 


one before. That’s a total of n — 2 multiplications (we get 1 and y for free). Thus, to 
produce the entire array it costs O(log N) multiplications, with each multiplication 


733 


(by above note) € O(log? N). That’s a total complexity of O(log’ N). This is done 
once for each factor in the product 


n-1 


IT (+) 


k=0 


rk 


To complete the computation of each of the k factors, we raise one of our pre-computed 
y? to the x, power. That’s is x, multiplications for each factor. Wait a minute — x, 
is a binary digit, either 0 or 1, so this is nothing other than a choice; for each n we 
tag on an if-statement to finish off the computation for that factor. Therefore, the 
computation of each factor remains O(log? NV). 


There are n factors to compute, so this tags on another log N magnitude to the 
bunch, producing a final cost for step 1 of O(log*.N). However, have not done the 
big II product yet... . 


25.7.2 Complexity of Step 2 


Each binary product in the big II is O(log? N). The big II has n — 1 such products, 
so the final product (after computing the factors) is O((n — 1) log? N) = O(log? N). 


Combining Both Results 


The evaluation of all n factors, O(log* N), is computed in series with the final prod- 
uct, O(log? NV), not nested, so the slower of the two, O(log’ N), determines the full 
complexity of the oracle. Note that this was a lazy and coarse computation, utilizing 
simple multiplication algorithms and a straightforward build of the f(x) = y* (mod 
M), function, and we can certainly do a little better. 


As we demonstrated when covering Shor’s algorithm, the relationship between M 
and N (N/2 < M? < N) implies that this is the equal to O(log* M) 


25.7.3. Absolute Complexity of Shor’s Factoring 


(This is also stated in the lecture on Shor’s quantum period-finding. ) 


We have now shown the the oracle can be built with an efficiency that is at least 
O(log’M). Shor’s period finding had a relativized complexity of O(log?M), due to 
a constant time circuit sampling that contained four computational components in 
series, 


e H®": O(log M), 
e the oracle: O(log* M), 
e QFT: O(log? M), and 


734 


e classical post-measurement processing (EA/CF): O(log?M). 


The bottleneck is the oracle, at O(log* M), which we will use as our polynomial time 
complexity, proving the absolute speed-up of quantum factoring. We acknowledge the 
existence of faster oracles than the construction provided above, improving overall 
algorithm, accordingly. 


735 


List of Figures 


After measuring one location, we know themall ............ 22 
After measuring one location, we don’t know much .......... 23 
a few numbers plotted in the complex plane .............. a2 
visualization of complex addition ..............0.0004 34 
the connection between cartesian and polar coordinates of a complex 

SOO te ee es eS Oe YO ee ee 36 
COnjusetion a6 PreieshiOn . 4454 ch dh dR REE Se BEES 37 
mndulie OF A complex muamber .. 62 ee Rk eR ee ER eR RR RRS 38 
e after 9 seconds, @1lrad/sec ... 1... ee ee 40 
multiplication — moduli multiply and args add .........2.2.2.. 43 
CNG High WOON 2 kb ee BER me Ee OS EE Os 46 
Three of the fourth roots-of-unity (find the fourth) .......... AT 
A wert Wi cn a 6 ved eo ed a A ee ne 
Vector addition in R?. 2... ee 54 
Scalar multiplication mM. ew ee HED 55 
Orihogomal vectors... ek ee ba ea a ee ws 56 
A vector MR. gk che be ch ee RAH OS ORDA HE SO 58 
A vector expressed as linear combination of Xandy.......... 61 
A vector expressed as linear combination of bb and b,......... 64 
A vector expressed as linear combination of cy andcy ......... 65 
Dot-product of the first row and first column yields element 1-1 ... 75 
Dot-product of the second row and second column yields element 2-2 75 
Minor of a matrix element ..........0. 0000 2 eee eee 80 
The numerator of Cramer’s fraction. ............02 0004. 85 
The Cauchy sequence {1 — thio has its limit in [0, 1] ........ 98 


736 


4.2 
4.3 
4.4 
4.5 


5.1 
5.2 
5.3 
5.4 
5.9 
5.6 


6.1 
6.2 
6.3 
6.4 
6.5 
6.6 
6.7 
6.8 


6.9 
6.10 


6.11 
6.12 
6.13 
6.14 
6.15 
6.16 
6.17 
6.18 
6.19 
6.20 
6.21 


19.1 


The Cauchy sequence {1 — thio does not have its limit in (0, 1) .. 98 


Triangle inequality in a metric space.svg from Wikipedia ....... 100 
A “3-D” quantum state is a ray in its underlying HH =C® ...... 101 
Dividing a vector by its norm yields a unit vector on the same ray . 102 
T mapping a vector vin R?toawinR®................ 108 
A. Scaling tranciormnionm.. 6 -« bi oe we YE Rae ws 109 
Projection onto the direction Z, a.k.a. X32... ee 110 
Projection onto an arbitrary directionn ................ 110 
Rotation of X counter-clockwise by 7/2 ..............-.. 112 
Rotation of ¥ counter-clockwise by 7/2. .............04. 113 
Classical angular momentum ............ 0.000002 eee 135 
A classical idea for spin: A 3-D direction and a scalar magnitude .. 136 
Polar and azimuthal angles for the (unit) spin direction ........ 136 
A soup of electrons with randomly oriented spins ........... 137 
The z-projection of one electron’s spin .............+004. 138 
The Classical range of z-projection of spin ............... 138 
The measurements force S, to “snap” into one of two values. ..... 139 
Near “vertical” spin measurements give illegally accurate knowledge of 

Pere MEN ks ee ek KE TA HEED EREM DR ES 140 
Plans for a follow-up to experiment #1 ................ 141 
Results of follow-up measurements of S, on the +z (|+)) and —z (|—)) 

POS co ke be BRE SES ER EEE EADS KH SE BHO SEH SG 141 
Experiment #2: |+), electrons enter an S, measurement apparatus. . 142 
The input state for expermment 2... 2.4. ba ee eke ee es 143 
ie Spt jection GE SUI gk ew ERE MEER ew EE Eee | 143 
Viewed from top left, the classical range of x-projection of spin. ... 148 
A guess about the states of two groups after experiment #2 ..... 144 
|—) electrons emerging from a group of |+) electrons ......... 145 
The destruction of |+) S, information after measuring S,....... 146 
A spin direction with polar angle 6 from +z, represented by |w) ... 148 
The prepared state for experiment #3, prior to measurement... . . 149 
Classical, or semi-classical expectation of experiment #3’s measurement 149 
Probabilities of measuring |+) from starting state |W), 0 from +z .. 150 
Graph of the quintessential periodic function y=sinz ........ 544 


737 


19.2 The function y = tan x blows-up at isolated points but is still periodic 


POO PEOO NY) of ese ee ea eee ee ee ee ee ow ee 544 
19.3 Graph of a function defined only for x € [-1,3) ............ 544 
19.4 Graph of a function defined everywhere, but whose support is [—1, 3], 

the elosure of [=1,0) W109) 22 ee et eee ea eee ee ees 545 
19.5 A periodic function that can be expressed as a Fourier series .... . 546 
19.6 A function with bounded domain that can be expressed as a Fourier 

eeriés igupport Wik = 2a) 2. 6 cw ewe eR ee aR Ew 546 
19.7 A low frequency (n = 1: sinz) and high frequency (n = 20 : sin 202) 

basis function in the Fourier series... .........---2-000- 548 
19.8 f(x) =<, defined only on bounded domain |[—7, 7) .......... 549 
19.9 f(a) = as a periodic function with fundamental interval [—7, 7): . 549 
19.10First 25 Fourier coefficients of f(z) =a..........--..-040. 550 
19.11Graph of the Fourier coefficients of f(z) =a .............. 550 
1p, T2Pourer partial cum of fig) ae teh ao oa he kee a wR ea Oa SG 552 
ID lohourier partial enim of fisjSetenaHo0 « «he ees dee ew ees 552 
19.14Fourier partial sum of f(z) =z ton=1000 .............. 553 
19.15 f(a) has bounded domain, but its Fourier expansion is periodic. . . . 553 
19.16 f = 10 produces ten copies of the period in [—-.5,.5).......... 558 
19.17 f = .1 only reveals one tenth of period in [—.5, .5)........... 558 
UL COR OPA bg ee a ee hE Bred BE HOE HG OD OS 564 
SU OS SN nc ea eg bel de kee be CRS SER ES ERS 564 
20.3 Square-integrable wavefunctions from Wikipedia StationaryStatesAn- 

imation.gif, leading to the absolutely integrable W?(z) .. 2... . 565 
20.4 A simple function and its Fourier transform .............. 566 
20.5 Interpretation of o from Wikipedia Standard_deviation_diagram) .. 566 
20.6 Graph of the Dirac delta function .................... 568 
20.7 Sequence of box functions that approximate the delta function with 

IMCTSRSI SCCUIORY cee be Be Ee ee ee eee REG 568 
20.8 Sequence of smooth functions that approximate the delta function with 

ICTSASIG BOCUITACY 6 kc ia he lee RR OS REDE EE 569 
20.9 #|1] is a O-centered delta function ................0.. 571 
20.10.¥ [cos a] is a pair of real-valued delta functions. ............ 572 
20.11.¥ [sin 2] is a pair of imaginary delta functions ............. 572 
20.12Area under |f|? and |F|? are equal (Plancherel’s Theorem) ...... 574 
SU Teenie) atid lie apettati. sk neds hee Ewe eB eee ee 576 


20, lagm( je) aid tts spectrin <. ge. bk ede Oe ed ws dw we BS 576 


20.15A normalized Gaussian with o? = 3 and its Fourier transform .... 578 
20.16A more localized Gaussian with o? = 1/7 and its Fourier transform . 578 
21.1 Continuous functions on a continuous domain ............. 581 
21.2 Sampling a continuous function at finitely many points ........ 581 
21.3 Primitive 5th root of 1 (the thick radius) and the four other 5th roots 

We EMCI NISe 6c 6 a be Go we EG a hey HO HES Boe GB Oa es 583 
21.4 The DFJ of a non-periofic yector 2... 64.6 bee we ee eS 589 
21.5 The spectrum of a periodic vector... ...........+2-00-.% 590 
21.6 The spectrum of a very pure periodic vector .............. 591 
21.7 Going from an 8-element array to two 4-element arrays (one even and 

[hss Cs a a a a a an a am gn a CY ra 600 
21.8 Decomposing the even 4-element array to two 2-element arrays (one 

preien e Od) oc ee oo ee pee ee ee Ee ee 601 
21.9 Breaking a two-element array into two singletons ........... 601 
21.10F inal positions of fo should be next to fy. ............02.-. 602 
2). lidinglstons ere theriownh DFTs . .... 264 ele ewe ee we ws 602 
21.12Bit-reversal reorders an eight-element array. .............. 603 

2 1 =¢ Pelt 
21.13 [Fee], = Fede] cee Fedo} ig OE 605 
4 2 a ae | até 

21.14[ Ff ip = Fe). can Fro), eae ee 606 
23.1 The spectrum of a vector with period 8 and frequency 16 = 128/8 .. 640 
Zo.2 sinlox) and We epectidm 2.2.2.6 26 be ee eee ee ee ee: 640 
23.3 The spectrum of a purely periodic vector with period 8 and frequency 

De og ee et ew ae He a ed Ha Be es we 641 
23.4 Graph of two periods of a periodic injective function. ......... 642 
23.5 Example of a periodic function that is not periodic injective .... . 644 
23.6 We add the weak assumption that 2(+) a-intervals fit into [0, WM) .. 645 
23.7 Typical application provides many a-intervals in [(0, MW) ........ 646 
23.8 Our proof will also work for only one a interval in [(0, MW) ....... 646 
23.9 N = 2” chosen so (N/2,N] bracket M7 ..............04. 647 
23.10Eight highly probable measurement results, cm, for N = 128 and a = 8 650 
23.11 Kasy case covers alN, OO 2 we Pe ee bee eee eee 655 
23.12|0, NV) is the union of distinct cosets of sizea... 2... ....0040. 655 


739 


23.13The spectrum of a purely periodic vector with period 8 and frequency 


A ee ee es ee ee ee 658 
23.14Eight probabilities, .125, of measuring a multiple of m=16...... 664 
23.15There is (possibly) a remainder for N/a, called the “excess” ..... 670 
23.16[0, N’) is the union of distinct cosets of size a, except for last .... . 670 
23.17The final coset may have size< a... 2... 2. eee ee 671 
23.18If 0 <a < N —ma, a full m+ 1 numbers in Zy map to f(z)... . . 672 
23.19If N — ma < x <a, only m numbers in Zy map to f(z) ....... 672 
23.20The spectrum of a purely periodic vector with period 10 and frequency 

ee nk a ee ee ee ee See ee 674 
23.21A very long line consisting of a copies of N=2"............ 678 
23.22Half-open intervals of width a around each pointcN ......... 679 
23.23Exactly one integral multiple of a falls in each interval ........ 679 
23.24Probabilities of measuring y, = 51, y; = 64 or yg = 77 are dominant.. 682 
23.254, all fall in the interval [-a/2,a/2)................04. 683 
23.26The chord is shorter than the arc length ................ 685 
23.27|sin(x/2)| lies above |z/7| in the interval (—7,7)........... 686 
23.28|sin(x/2)| lies above |Ka/a| in the interval (—1.57, 1.57) ...... 692 
23.29Exactly one integral multiple of a falls in each interval ........ 695 
23.30.N = 2” chosen so (N/2,N] bracket M2? ..............-.. 696 
23.31|sin(x/2)| lies above |La/z| in the interval (—.47147, .47147) .... 701 
24.1 A wide shot of the of the convergents starting at 2/1 ......... 719 
24.2 A close-up showing some of the latter convergents............ 719 
24.3 First view of convergents ...........0-0 00020 eee eee 720 
24.4 Second view of convergents.........-0 0002 eee eee 721 
24.5 ‘Third view of comvergents ...2.2 55224 G ee eee ewe eae: real 
24.6 Fourth view of convergents..........0002. 020 ee eee 721 
24.¢ Fitth view of converwents.. ... 4 2. 440664 hae ee eee 721 


740 


List of Tables 


741 


